Native Language Support 
HP-UX Concepts and Tutorials 

HP 9000 Series 300/800 Computers 

HP Part Number 97089-90058 



HEWLETT 
PACKARD 



Hewlett-Packard Company 

3404 East Harmony Road, Fort Collins, Colorado 80525 



Legal Notices 



The information contained in this document is subject to change without 
notice. 

Hewlett-Packard makes no warranty of any kind with regard to this manual, 
including, but not limited to, the implied warranties of merchantability and 
fitness for a particular purpose. Hewlett-Packard shall not be liable for 
errors contained herein or direct, indirect, special, incidental or consequential 
damages in connection with the furnishing, performance, or use of this 
material. 

Warranty. A copy of the specific warranty terms applicable to your 
Hewlett-Packard product and replacement parts can be obtained from your 
local Sales and Service Office. 



Copyright © Hewlett-Packard Company 1989 

This document contains information which is protected by copyright. All rights 
are reserved. Reproduction, adaptation, or translation without prior written 
permission is prohibited, except as allowed under the copyright laws. 

Restricted Rights Legend. Use, duplication or disclosure by the U.S. 
Government Department of Defense is subject to restrictions as set forth in 
paragraph (b)(3)(h) of the Rights in Technical Data and Software clause in 
FAR 52.227-7013. 



Use of this manual and flexible disc(s) or tape cartridge(s) supplied for this 
pack is restricted to this product only. Additional copies of the programs can 
be made for security and back-up purposes only. Resale of the programs in 
their present form or with alterations, is expressly prohibited. 

Copyright © AT&T, Inc. 1980, 1984, 1986 

Copyright © The Regents of the University of California 1979, 1980, 1983, 
1985 

This software and documentation is based in part on the Fourth Berkeley 
Software Distribution under license from the Regents of the University of 
California. 



Printing History 

The manual printing date and part number indicate its current edition. The 
printing date will change when a new edition is printed. However, minor 
changes may be made at reprint without changing the printing date. The 
manual part number will also change when extensive changes are made. 

Manual updates may be issued between editions to correct errors or to 
document product changes. To ensure that you receive updates or new 
editions, you may subscribe to the appropriate product support service, 
available from your HP sales representative. 

September 1989. First Edition 



iv 



Contents 



1. Using this Manual 

Typographical Conventions in This Manual 1-3 

Related HP-UX Manuals 1-4 

2. Introduction to NLS 

Overview of Software Internationalization 2-1 

What is Native Language Support (NLS)? 2-3 

Aspects of NLS Support 2-4 

Character and Text Handling 2-4 

Comparing Strings and Comparing Characters 2-6 

Regular Expressions 2-8 

Local Customs and Conventions 2-9 

Messages 2-11 

3. Using International Software 

NLS Environment Variables 3-2 

Setting Your Environment 3-3 

Setting Your Terminal 3-4 

Reference Information for Internationalized Commands .... 3-4 

Internationalized Messages 3-5 

Using Internationalized Commands 3-5 

4. Developing International Software 

General Programming Issues 4-1 

Initializing NLS 4-2 

Recommended Initialization 4-2 

Data Integrity 4-3 

Programming with Multi-byte Characters 4-4 

Programming with Wide-Characters 4-6 

Conversion of Existing Programs 4-7 



Contents-1 



Character and String Processing 4-8 

Conversion of Existing Programs . 4-10 

Creating and Using a Message Catalog System . 4-10 

Programming for Messages . . . 4-11 

Opening a Message Catalog 4-11 

Search Path and Naming Conventions 4-12 

Retrieving Messages 4-13 

Closing a Message Catalog 4-13 

Default Messages 4-14 

Compiling and Linking 4-14 

Creating a New Message Catalog 4-14 

The Message Text Source File 4-15 

Compiling a Message Catalog 4-16 

An Example of Programming with Message Catalogs ... 4-17 

Special Considerations 4-18 

Libraries with Messages 4-20 

Conversion of Existing Programs for NLS Messaging .... 4-22 

Testing a Message Catalog 4-25 

Installing a Message Catalog . . 4-26 

Source Code Management 4-26 

Keeping nLprog.c Files 4-26 

Multi-file Programs 4-26 

Adding a Message to a Messaging Program 4-26 

Using "make" Files 4-28 

Guidelines for Using Messaging . 4-28 

5. Administering International Software 

Finding NLS Files 5-1 

The Default User Environment 5-3 

Terminal Configuration 5-3 

Installing Message Catalogs 5-3 

Installing Optional Locales 5-4 

Peripheral Configuration . . 5-4 

European Character Sets 5-4 

Katakana Character Sets 5-5 

Other 8-bit HP Character Sets 5-5 

16-bit HP Character Sets 5-5 

Non-HP 7-Bit Character Sets 5-5 



Contents-2 



6. Localizing International Software 

Localizing the User Environment 6-1 

Localizing Message Catalogs 6-2 

The C Locale Messages 6-2 

Preparing for Translating Messages 6-2 

Installing Localized Messages 6-3 

Creating a Locale 6-4 

7. Advanced NLS Topics 

Codeset Conversion 7-1 

Processing Right-to-Left Languages 7-2 

Locale Information . 7-4 

Initialization 7-4 

Special Locales . 7-4 

Special Message Catalogs 7-5 

Default Message Catalogs . 7-5 

Programs That Call Exec 7-6 

Messaging: printf/scanf Data Formatting . . 7-6 

A. Examples of Internationalized Software 

Example 1: Rtlcat A-l 

Example 2: Makefile . A- 16 



B. NLS References 

C. Previous Usage 

D. Languages and Codesets 
Glossary 

Index 



Contents-3 



Using this Manual 



1 



This manual is for people who are using, writing, or translating programs in 
a multi-lingual environment and who will need to make use of the various 
elements of Native Language Support (NLS). 

You will find specific sections of this manual to be written at a technical level 
appropriate to general users, system administrators, NLS coordinators, and 
applications programmers. 

■ General Users should read Chapters 2 and 3. 

■ System Administrators and NLS Coordinators should read Chapter 5 and 6. 

■ Applications Programmers should read Chapter 4, 6, and 7. 

For further details, refer to the Appendices. For example, Appendix A provides 
examples of internationalized programming. All of the NLS commands and 
subroutines discussed in this manual are referenced in Appendix B and in the 
Index. 



Using this Manual 1-1 



To find this information 



Please see 



Using this Manual: This chapter explains the Chapter 1 

typographical conventions used in this manual and 
identifies other manuals referenced in the contents. 

Introduction to NLS (for the general user): This Chapter 2 

chapter presents a basic description of the scope 

of Native Language Support, localization, and 

internationalization, including general aspects of 

character set handling, local conventions, messages, 

and internationalization. 

Using International Software (for the general user): Chapter 3 

This chapter shows how to run a localized application 
including terminal configuration, environment setup, 
and selection of the language. 

Developing International Software (for the Chapter 4 

programmer): This chapter describes the 
initialization process, character and string processing, 
and gives a brief introduction to setting up the 
message interface. 

Administering International Software (for the system Chapter 5 

administrator and the NLS coordinator): This chapter 

identifies the HP-UX directories and files, how to set 

up the user environment, installing message catalogs 

and optional locales, and configuring terminals and 

peripherals. 

Localizing International Software (for the system Chapter 6 

administrator and the NLS coordinator): This chapter 
explains the details of localization for special user 
requirements, localizing message catalogs, and creating 
specialized locales. 

Advanced NLS Features (for the programmer:). Chapter 7 

This chapter explains the NLS character- and 
string-processing tools, processing non-Latin character 
input/output, and special treatment of locales and 
message catalogs. 



1-2 Using this Manual 



To find this information 



Please see 



Examples of Internationalized Software: Character 
processing, collation, monetary formatting, messaging, 
and date/time. 

NLS References: An alphabetic listing of HP- UX 
Reference locations for all NLS commands and 
routines. 

Previous Usage: Tables of current and obsolete NLS 
commands and routines. 

Languages and Codesets: A listing of the native 
languages that are supported by HP codesets. 

Definitions: Major words and concepts used in this 
manual. 



Appendix A 

Appendix B 

Appendix C 
Appendix D 
Glossary 



Typographical Conventions in This Manual 



Italics 



New Terras 



This typography indicates manual names and references 
to manual pages in the HP- UX Reference. Italics are 
also used for symbolic items either typed by the user or 
displayed by the system, as discussed below under Variable 
name. 

This typography is used when an important new term is 
introduced. 



Computer literal This typography indicates literal input to, or output from, 
the computer. Type the characters in this font exactly as 
they appear on the page. For example: 

findstr prog., c > prog, str 

Variable name This typography indicates that you need to "fill in the 
blank" in a command line with your own word or data. 
This font is used for names of variables and symbolic 
names. For example: 



Using this Manual 1-3 



cat file-name 

means you type cat and substitute the appropriate 
file-name to complete the command line. 

This typography indicates a key on your keyboard. For 
example, ( Return ) means to press the "Return" key. When 
prefixed by (shift) , (ctrl ) , or (Extend char) , press both keys 
simultaneously. For example: 

(ctrl ) -[c] 

means you press the ( ctrl ) key and continue to hold it 
while you press the (c) key. 



Related HP-UX Manuals 

This manual may be used in conjunction with other HP-UX documentation. 
References to these manuals are included, where appropriate, in the text. 

■ The HP- UX Reference contains the syntactic and semantic details of all 
commands and application programs, system calls, subroutines, special files, 
file formats, miscellaneous facilities, and maintenance procedures available on 
the HP 9000 HP-UX Operating System. 

■ The HP- UX Portability Guide documents the guidelines and techniques for 
maximizing the portability of programs written on and for HP 9000 Series 
200, 300, and 500 computers running the HP-UX operating system. It 
covers the portability of high-level source code (C, Pascal, FORTRAN) and 
transportability of data and source files between commonly used formats. 

■ HP- UX System Administration Tasks provides step-by-step instructions 
for installing and updating the HP-UX Operating System software and for 
installing the NLS languages, if they are optional for your system. It also 
explains procedures for system boot and login, and contains guidance for 
implementing administrative tasks. 



( keycap ) 



1-4 Using this Manual 



HP- UX Concepts and Tutorials: Facilities for Series 200, 300, and 500 
contains valuable guidance for setting up your terminal and configuring the 
softkey definitions. 

The NLIO System Administrator's Guide provides installation and 
configuration procedures for NLIO, which is the set of servers and filters used 
to input and output 16-bit characters efficiently on 16-bit hardware. It also 
contains fileset and font descriptions for the supported languages. 

The Native Language I/O Access User's Guide describes how to use the 
NLIO system. 

The NLIO Code Books and the NLIO Input Method Guides describe the 
language-specific codes and the input methods used for the supported 
languages. 

Finding HP-UX Information (2 vols.) provides a cross-index, detailed 
descriptions, and part numbers for the Series 300 and 800 HP-UX manuals. 

Unless otherwise stated, all references in this manual such as "see 
langinfo(SC) for more details", refer to entries in the HP-UX Reference 
manual. 



Using this Manual 1-5 



Introduction to NLS 



2 



Overview of Software Internationalization 

The users of HP-UX speak many different languages and observe many 
different cultural practices. Local language- processing capability is becoming a 
high priority with the kinds of software products which are increasingly in use 
throughout the world. For this reason, we have found that users need software 
which will easily accommodate local conventions. 

To do so effectively, software products are required to preserve the integrity of 
data, correctly handle the written conventions of a variety of languages, and 
provide a message interface in the user's language. In addition, they must be 
versatile in handling a variety of local data-formatting conventions. 

There are two processes involved in the NLS approach to enhancing software 
for international use: 

■ The process of internationalizing software includes supporting the letters and 
symbols required to read and write the user's language, processing characters 
and text according to the rules of the user's language, providing for 
translated messages and prompts, and changing functions and conventions to 
comply with local requirements. For a number of reasons, it is also desirable 
that such internationalization be accomplished with a minimum of change to 
the program code itself. 

■ The process of localizing adapts the software to a particular locale, including 
the translation of messages and the use of appropriate language tables on the 
local system. 



Introduction to NLS 2-1 



In general, then, the main requirements which Hewlett-Packard has addressed 
to facilitate the international use of software are: 

■ Preserving the integrity of the data 

■ Proper handling of characters 

■ Appropriate message interfacing in the local language 

■ Proper representation of local customs in the software 

HP Native Language Support provides an extensive set of tools and routines 
for implementing language-independent software. Software can, with relatively 
minor modifications, use language-dependent processing information which is 
stored externally to the program code. At run time, the application accesses 
the processing information appropriate for the language then specified. There 
are some unique advantages to this NLS strategy: 

■ Software is not duplicated in different versions for different languages. This 
makes it easier to update and maintain the program. 

■ Because all language-dependent processing information is kept external to 
the program source, programmers need not modify the program source 
when modifying messages. The chance of "bugs" being introduced into the 
software as a result of this process is eliminated. 

■ Since software can be localized more easily, the time and expense required by 
localization is relatively low. 

■ Many users could simultaneously share the same copy of a program, 
with each one potentially using a different language or set of language 
conventions. 

Hewlett-Packard's Native Language Support has been adopted as the basis for 
the X/Open Portability Guide (XPG) Issue S. HP has an ongoing test process 
to ensure compliance with applicable standards of POSIX and ANSI-C 



( 



2-2 Introduction to NLS 



What is Native Language Support (NLS)? 

NLS provides a number of features to aid the international user: 

■ It permits users to specify the desired language at run time. 

■ It allows different users to use different languages on the same system. 

■ It provides the programmer with the ability to internationalize software. 

NLS supports these features by providing language-dependent tables for various 
locales and by the processes of program internationalization and localization. 
Internationalization involves: 

■ The replacement of the original HP-UX routines in an application with NLS 
versions of the routines. For example, the routine ctime would be replaced 
with the NLS-enhanced version strf time. 

■ The provision of tools for copying all hard-coded messages into external 
message catalogs and for updating the message catalogs. 

Localization then adapts the internationalized software application or system 
for use in a specific linguistic environment. This includes translating the text in 
the message catalogs into the local language. 

Message catalogs and language tables can be specified at run time, rather than 
having the messages compiled into the programs. For a given piece of software, 
this message cataloging process only needs to be done once. 



Introduction to NLS 2-3 




Figure 2-1. Hewlett-Packard Localization Centers 

Localization and internationalization can often be facilitated by Localization 
Centers operated by various Hewlett-Packard Country Product Organizations, 
some of which are shown above. 

Aspects of NLS Support 

There are three aspects of Native Language Support included in HP-UX 
software: 

■ Character and text handling 

■ Local customs and conventions 

■ Messages 

Character and Text Handling 

NLS provides the ability to identify and manipulate characters in a variety of 
ways and to handle language-specific processing of text: 

■ Character Sets. In an HP-UX environment, the default local language 
character set is 7- bit ASCII (or USASCII). All programs which are not 
internationalized, or those that are internationalized but in which the user 



2-4 Introduction to NLS 



has not enabled XLS, use this character set. Note, however, that 7-bit 
ASCII is not even sufficient to span the Latin based alphabet used in many 
European languages. And yet, for many Asian languages, character sets can 
contain several thousand members. This is more than can be encoded in 
the single 8-bit number which is the conventional value used to represent 
character data. For this and other reasons, NLS character-handling has the 
following characteristics: 

□ The 8th bit of a character byte is never stripped or modified. 

□ The extra bit is used to support languages that have additional characters, 
accented vowels, consonants with special forms, and special symbols. 

□ Multi-byte characters may be used for character codesets which are 
exceptionally large. 

There are many implementations of non-ASCII character sets currently in 
use. NLS permits users to define their own character sets and character 
properties. However, HP has already defined character sets which permit the 
processing of several European, Middle Eastern, and Asian languages. 

For European and Middle Eastern languages, HP has defined a series of 
8-bit character sets. Every IIP 8-bit character set is a superset of ASCII. 
The HP-supported 8-bit character set for Western European languages is 
ROMANS. Other 8-bit character sets are defined for other locales. For a 
listing, please refer to Appendix D, "Languages and Codesets". 

For alphabets of more than 256 characters, such as Kanji (a Japanese 
ideographic character set), multi-byte character codes are required. HP has 
defined a multi-byte character encoding scheme, HP- 15, which uses two bytes 
(1.6- bits) to represent a character. Four sets are defined under this scheme, 
which are used to represent Traditional Chinese, Simplified Chinese, Korean, 
and Japanese. In addition, HP provides support for the Japanese UJIS 
character set. These are used for data processing and storage. For input 
and output, HP uses a multi-byte character encoding scheme called HP- 16. 
Appendix D lists both single- and multi-byte codesets available from HP. 

Users can also define their own languages using buildlang with non-HP 
defined codesets. For more information on buildlang, see Chapter 6, 
"Localizing International Software" , in this manual. 



Introduction to NLS 2-5 



■ Character type and Conversion. All sorting, case shifting, and type analysis 
of characters is done according to the local conventions for the native 
language selected. While the ROMAN8 character set has uppercase and 
lowercase for most alphabetic characters, some languages discard accents 
when characters are shifted to uppercase. European French commonly 
discards accents in uppercase, while Canadian- French does not. If there is 
no representation of case in the user's language, as is the case in ideographic 
languages such as Japanese, characters are not shifted at all. 

■ Collation. Each languages may use its own distinct "collating sequence" — 
the sequence in which characters or words are ordered by the computer. 
Some language may even have more than one set of collation rules. The 
ASCII collation order, which is the default setting for HP-UX, while it is 
fast, is inadequate even for the accuracy requirements of American dictionary 
sorting. Each language may order the characters in its character set 
differently, and certain character sets have multiple acceptable orderings. 

Chinese is an example in which the ideographic characters can be sorted in 
order of: 

□ The numeric value of the character as represented in a computer character 
set 

□ The number of strokes required to represent the character 

□ The radical (root) of the character 

□ The number of strokes added to the radical 

Comparing Strings and Comparing Characters 

The order into which character strings are sorted is language-dependent. 
Traditionally, most comparative ordering is based on ASCII values. But, with 
the extension of the ASCII character set to ROMAN8 for support of other 
languages, the ordering of a character within a character set no longer coincides 
with the character's traditional alphabetical order. 

For example, "a" "follows "b" in the character set ordering but is sorted before 
"b" in many cases. This situation makes sorting based on the code of each 
character inappropriate when internationalizing software. 



2-6 Introduction to NLS 



In addition, sorting based on character code does not provide true dictionary 
sorting even in the case of the ASCII character set. Dictionary order sorts "a" 
after "A" and before "B" , whereas ASCII based order sorts "a" after both "A" 
and "B" . The following is an example of sorting the same list based on the C 
sorting method, and based on a German sorting method. 



Table 2-1. Sorting Example: C vs. German 



Sorted by 


Sorted by 


C rules 


German rules 


Airplane 


Airplane 


Zebra 


apfel 


bird 


bird 


car 


car 


apfel 


Zebra 



Beyond the ordering of individual characters, some languages designate that 
certain characters be treated in a special way. For example, in some languages 
groups of characters are clustered and treated as a single character. In Spanish 
"11" is treated as a single character, and it is sorted after "1" and before "m" . 
Similarly, the "ch" in Spanish is treated as a single character, and it is sorted 
after "c" but before "d": 



Table 2-2. Sorting Example: C vs. Spanish 



Sorted by 


Sorted by 


C rules 


Spanish rules 


chaleco 


cuna 


cuna 


chaleco 


dfa 


dia 


llava 


loro 


loro 


llava 


maiz 


maiz 



Introduction to NLS 2-7 



When sorting strings in some languages, a single character is expanded and 
treated as if it were really two characters. For example, when sorting strings in 
German, 13 (the "sharp s" ), is treated as if it were ss . 



Table 2-3. Sorting Example: C vs German 



Sorted by 
C rules 


Sorted by 
German rules 


Rosselenker 
Rostbratwurst 
Rofihaar 


Rosselenker 
Rofihaar 
Rostbratwurst 



In some languages, certain characters such as "-" are ignored when collating 
strings, and these also need to be taken into account. 

■ Data directionality. This is the spatial order in which data is displayed vs. 
the order in which it is entered. Data directionality is not the same for all 
languages. For example, some Middle Eastern languages are read from right 
to left and may be mixed with insertions in left-to-right European languages. 
XLS allows for processing of this type of character data. Currently, no 
special provisions are made for top-to-bottom languages, such as Chinese, 
which are handled in a left-to- right orientation. 

■ Multi-byte characters. Finally, character handling also involves the correct 
parsing of multi-byte character streams and the interpretation of multi-byte 
characters. Multi-byte character streams may contain both single-byte and 
multi-byte characters. To process this data, each byte must be identified 

as either a single-byte character or as part of a multi-byte character. The 
details of these and other aspects of character handling are discussed in the 
chapter "Developing International Software" , in this manual. 

Regular Expressions 

HP-UX allows the specification of arbitrary character strings through the use of 
regular expressions. For further details on their use, see the section, "Regular 
Expressions" , in Text Editors and Processors, HP- UX Concepts and Tutorials. 
The syntax of regular expressions has been extended in HP-UX to allow use 
with other character sets. 



2-8 Introduction to NLS 



Here is one example of an internationalized regular expression: 
h[[=e=]]lp 

This matches the word "help" spelled with any variation of the letter "e" (e, e, 
e, e, etc.). 

The existing syntax of a range expression (e.g., "[a-z]") is not changed. 
However, its meaning has been extended to mean "match any collating element 
which falls between the two given collating elements based on the current 
locale's LC_COLLATE collation sequence." 

For multi-byte languages, the support in regular expressions is not so extensive. 
For example, multi-byte characters are allowed as single character elements 
in these expressions, and they can be used in character ranges. However, the 
inverse of a range ("[~a. . z]") is not allowed with multi-byte characters in 
general. This is due to restrictions in the way the codesets are implemented. 
Moreover, some new features are not allowed with multi-byte codesets simply 
because they have no application to Asian languages. 

Local Customs and Conventions 

Some aspects of NLS relate also to the local customs or conventions of a 
particular geographic area. These aspects, even when supported by a common 
character set, change from region to region. Consequently, number format, 
currency information, date and time, case shifting, and collation are presented 
according to the user's local conventions. In NLS, all these environmental 
characteristics are called the "locale" . 

For instance, although Great Britain, the United States, Canada, Australia, 
and New Zealand share the English language, aspects of data representation 
differ according to local customs. Variations are encountered in the following 
everyday matters: 

■ Representation of numbers (numeric formatting) 

■ Representation of currency units (monetary formatting) 

■ Display of time 

■ Display of days, weeks, months 

■ Numeric Formatting. In the representation of numbers, all the following 
depend on local customs: 



Introduction to NLS 2-9 



□ The "radix" symbol which performs the decimal-indicating function (the 
period in the U.S.) 

□ The digit grouping symbol (the comma in the U.S.) which serves to I 
separate groups of integers 

□ The convention for grouping integer digits (by three's in the U.S.) 
In the U.S., a number is represented as follows: 

2,345.678 

But when representing the same number in France, the decimal point and 
the digit-grouping symbol are reversed: 

2.345,678 

■ Monetary Formatting. Currency units and how they are subdivided vary 
with region and country. The symbol for a currency unit can change as well 
as the placement of the symbol. It can precede the numeric value, follow it, 
or appear within it. 

Between the currency conventions used by the U.S. and France, the symbols 
are transposed. 

$2,345.77 

versus 

2.345,77 FF 

■ Display of Time. Computation and proper display of time, including 24-hour 
vs. 12-hour clocks, must be considered. The HP-UX system clock runs on 
Coordinated Universal Time. Corrections to local time zones consist of 
adding or subtracting whole or fractional hours from UTC. Some regions, 
instead of using the Western Gregorian calendar system, designate the 
years by seasonal, astronomical, or historical events. One system which HP 
supports is the Imperial system used in Japan for numbering years based on 
the reign of the ruling emperor. 

( 



2-10 Introduction to NLS 



■ Display of Days, Weeks, Months. 

Names for days of the week and months of the year may vary with language. 
Rules for abbreviating these also differ. The order of the year, month, and 
day, as well as the separating delimiters, are not universally defined. For 
example, October 7, 1986 would be represented in the U.S. as: 

10/7/86 

in Germany, it would be represented as: 

7.10.1986 

and in the U.K. as: 

7/10/86 

■ The chapter "Advanced NLS Features" in this manual, describes the library 
routines used to handle these local customs. 

Messages 

The ability to customize messages for different countries is an important aspect 
of using NLS. NLS enables you to choose the language to be used for prompts, 
responses to prompts, and error messages. All of this can be done at run time. 
And, since messages are kept in catalogs separate from the program code, it is 
not necessary to recompile the source code when you are using the program in 
another language. 

It is, however, necessary to work closely with your translator to ensure that 
the semantics of system or program messages is correctly conveyed in the 
translation. In practice, the syntax of another language may force a change in 
the sentence structure of a translated message. 

For example, an English message for a given command might be interpreted 
two ways in German. 

The original in English is: 

cannot read at directory 
( "at" is an HP-UX command) 
In German, this message could be interpreted as: 

Kann das Verzeichnis nicht lesen. 



Introduction to NLS 2-11 



(Literally: "cannot read the directory" , with "at" misinterpreted as an 
untranslatable preposition) 

If the meaning of "at" is pointed out to the translator in a "cookbook" 
accompanying the message catalog, the message would be correctly translated 
as: 

at Verzeichnis nicht lesbar. 

(Literally: "'at' directory not readable." — the intended meaning) 

Handling messages in message catalogs helps ensure that the messages are 
accessible for editing, updating, and translating into other languages, as 
required. 

For details on the use of message catalogs, see the section "Localizing Message 
Catalogs" in the chapter "Localizing International Software" , in this manual. 



2-12 Introduction to NLS 



Using International Software 



3 



Read this chapter if you are: 

■ A general user of internationalized commands and software 

This chapter covers information and tasks you will need to deal with in order 
to use NLS commands successfully. The information and the tasks are minimal 
because, in most situations, you will be receiving help from two people on your 
staff: 

■ Your local NLS Coordinator, whose tasks are: 

□ Advising on optimal use of NLS features 

□ Ordering NLS software 

□ Communicating special configuration needs to your System Administrator 

□ Installing message catalogs 

□ Coordinating translation activities 

■ Your System Administrator, whose tasks are: 

□ Installing and updating the operating system 

□ Configuring the system 

□ Installing and initializing additional software 

□ Maintaining system software 

Your System Administrator should have already provided the appropriate 
configuration and initialization. 



Using International Software 3-1 



NLS Environment Variables 

Internationalized commands adapt their behavior to that specified by a set 
of NLS environment variables. These variables indicate user requirements for 
various aspects of NLS capabilities: 



LANG 

LC_COLLATE 

LC_CTYPE 

LC.MONETARY 

LCLNUMERIC 

LC_TIME 

NLSPATH 
LANGOPTS 



Specifies native language, local customs, and coded 
character set and messages. 

Specifies string collation. 

Specifies character classification and case conversion. 

Specifies currency symbol and monetary value format. 

Specifies decimal number format. 

Specifies date and time format and the names of days 
and months. 

Specifies search path for message catalogs. 

Specifies data directionality for right-to-left languages. 



By setting these variables, you can cause internationalized software to perform 
in a manner appropriate to your particular needs. For additional information 
on the NLS environment variables and their use, see environ{b) in the HP-UX 
Reference. 



3-2 Using International Software 



Setting Your Environment 



Local system default values for the NLS environment variables are ordinarily 
determined by your local NLS Coordinator and set by your System 
Administrator. These system default values are what you get when you log in 
unless you make provisions for something different. 

You can determine the setting of your NLS environment variables by typing: 
env 

If none of the NLS environment variables is set, as indicated by env, or if 
you are not sure what NLS environment you need, consult with your NLS 
Coordinator to determine the appropriate settings for your locale. 

If the local system default values are not satisfactory, you can get the NLS 
environment you need by setting the environment variables appropriately. 

For example, if you need a French locale, run the Bourne or Korn shell 
commands: 

LANG=french ; export LANG 

This is equivalent to the C shell command: 

set env LANG french 

It is generally convenient to add these commands to your .profile or .login 
file so that your preferred environment will be set when you login. 

If you will be running applications that need an NLS environment different 
from the system default and different from your individual environment, it is 
convenient to create a shell script that sets the environment variables as needed 
for the application. 

For example, to run the command prog in a special NLS environment, the 
following sh script could be used: 

: # run prog 

# set special NLS environment for prog 
LANG=english ; export LANG 
LC_TIME=italian ; export LC_TIME 
LC_MONETARY=german ; export LC_MONETARY 
LC_NUMERIC=french ; export LC.NUMERIC 

# run prog 

prog filel file2 



Using International Software 3-3 



Such a script could be installed by your System Administrator in /usr/bin 
and used to invoke your program as well as saving time in setting a special 
XLS environment. 



Setting Your Terminal 

First, check your terminal to ensure that it is configured for transmitting and 
receiving 8-bit data. For further information on terminal configuration, see 
Facilities for 200/300/500: HP-UX Concepts and Tutorials. 

To use international software, your terminal should also be set so that 
single-byte data is not corrupted by system software that might otherwise 
attempt to interpret the eighth bit of a byte. This bit is needed as part of the 
character code. To disable such interpretation, run: 

stty -istrip -parity 

It is generally convenient to add this command to your .profile or .login 
file. 



Reference Information for Internationalized Commands 

For any command you intend to use, consult the online man pages or the 
appropriate page in the HP-UX Reference to determine the extent to which 
it has been internationalized. The section "EXTERNAL INFLUENCES, 
Environment Variables" indicates NLS environment variables that affect the 
behavior of a command. For example, to see how LC-TIME affects the date 
command, run: 

man date 



3-4 Using International Software 



Internationalized Messages 



A command that has been internationalized for messages will have, in the 
HP-UX Reference section "EXTERNAL INFLUENCES, Environment 
Variables," a comment such as "LANG determines the language in which 
messages are displayed." Such a command, however, will not necessarily have 
message catalogs for all languages or even for any language other than for a 
default locale. 

When such a command is run, current locale messages will be displayed if 
they are available. Otherwise, default locale messages will be displayed. The 
command will, however, perform correctly for the current locale. 

For example, sort will correctly sort data in all supported locales. Messages 
issued by sort will be in the C locale (the default locale for HP-UX commands) 
unless localized message catalogs have been created. 

See the chapter "Localizing International Software" , in this manual, for more 
information on localizing message catalogs. 

Using Internationalized Commands 

To see what locales are installed on your system run: 
nlsinf o 

Set LANG to one of the installed locales and run: 
date 

You should get a result with the format and naming conventions of the locale 
specified by LANG. 

To test this further, try: 
cat file 

where file is non-existent. If there is a localized message catalog for cat you 
should get the cannot open message in the locale specified by LANG. If not, you 
will get the message in the C locale. 

If you do not get the expected results, check with your System Administrator 
to verify that the required language- specific files are properly installed on the 



Using International Software 3-5 



system. Otherwise, you should now be able to use internationalized commands 
without further special action. 



( 



( 



( 



3-6 Using International Software 



Developing International Software 



4 



Read this chapter if you are: 

■ A programmer for the local system 

This chapter covers the standard programming issues for: 

■ Developing international software 

■ Internationalizing existing software 

For a discussion of special cases see the chapter "Advanced NLS Topics" . 



General Programming Issues 

The programming issues your software must accommodate are: 

■ Initialization 

■ Preservation of data integrity 

■ Character and string processing 

■ Messaging 



Developing International Software 4-1 



Initializing NLS 

When you work with internationalized software, it is always necessary to 
provide the appropriate NLS initialization, and, in some cases, it is also 
necessary to use NLS library routines rather than conventional library routines. 
More extensive programming changes may be needed in special cases. 

There are two elements of NLS that must be initialized to activate the NLS 
behavior of a program: 

■ The program locale 

■ The program messages . 

The locale for a program is initialized by calling setlocale to make locale 
information accessible to the program. The messages for a program are 
initialized by calling cat op en to locate the appropriate messages and make 
them accessible to the program. 

The two initialization routines are independent. The program's locale does not 
affect messaging, and messaging does not affect the program's locale. 

Recommended Initialization 

For most applications, the following "standard" initialization is recommended: 
#include <nl_types . h> 

nl_catd catd; 

if ( ! setlocale (LC.ALL, "") ) { 

fputs(" set locale failed, continuing with \"C\" locale.", stderr) ; 
put env ( " LANG= " ) ; 
catd = (nl_catd)-l; 
} 

else 

catd = cat open ( "name", 0); 

With this initialization, all LC-categories will be set to the value of LANG, 
except for those categories in which the corresponding environment variable is 
set to another valid locale. For environment variables that are set to a valid 
locale, the value of the environment variable will override the value of LANG for 



4-2 Developing International Software 



that category. If the value of LANG is not set or is set to the empty string, then 
the C locale is used. 

With this initialization, LANG and NLSPATH specify a series of paths to search 
for a message catalog. If a catalog is found on one of these paths, messages 
issued by the program will be messages from the selected message catalog. If a 
message catalog is not found, messages issued by the program will be C locale 
messages. 

Note that even if set locale is successful, it is possible for cat open 0 to fail. 

This "standard" initialization assumes that messaging uses the "standard" 
default messages described in the section "Programming for Messages" below. 
For special cases, a non-standard initialization may be required. See the 
chapter "Advanced NLS Topics" in this manual for more information. 



Data Integrity 

Data integrity means that in processing codeset data, the data must not be 
corrupted. For single-byte codesets, the 8th bit must be preserved; it must 
not be stripped nor used by the program. For multi-byte codesets, single-byte 
characters must be correctly distinguished from multi-byte characters. 

HP's multi-byte codesets utilize a coding scheme in which the single-byte 
character codes for ASCII can be intermixed with the two-byte character codes 
used to represent ideograms. In these codesets, it is possible for the second 
byte of a two- byte character to have the same value as an ASCII character. 
For an arbitrary byte, it is not possible to know if the byte is a single- byte 
character or the second byte of a multi-byte character. This is the "byte 
redefinition" problem in which the second byte of a multi-byte character may 
be incorrectly interpreted as a one-byte character. 

To aid in processing multi-byte codesets and avoid the byte redefinition 
problem, there are two sets of routines available to the programmer. 



Developing International Software 4-3 



Programming with Multi-byte Characters 



For dealing with HP's multi-byte codesets, see nLtools-16(3C) in HP-UX 
Reference which describes a set of byte-status macros: FIRSTof 2, SECof 2, and 
BYTE_STATUS. These macros can be used to determine whether a byte value 
represents an single-byte character or part of a multi-byte character. 

Probably more useful, are character pointer macros that are analogous to byte 
pointer operations: 



Macro Call Byte Pointer Analog 

CHARAT(p) (*p) 

ADVANCE (p) (p++) 

CHARADV(p) (*p++) 

WCHAR(c, p) (*p = c) 

WCHARADV(c, p) (*p++ = c) 



These macros operate on byte pointers, but they make the appropriate calls to 
FIRSTof 2, etc., and advance the pointer one or two bytes as needed. 

These macros are not always needed. For example, the following program will 
correctly copy single-byte as well as multi-byte character strings. 

char *f, *t; 

while ( *t++ = *f ++ ) ; 

However, to copy only the printable characters of a character string requires 
special treatment. The following program will work for single- byte codesets 
but not for multi-byte codesets because of the byte redefinition problem: it 
is possible for the second byte of multi-byte character to be a non-printable 
ASCII character. Such a byte would not be copied to the destination string. 

#include <ctype.h> 

char *f, *t; 
int c ; 

while ( c = *f++ ) 



4-4 Developing International Software 



if ( isprint(c) ) 
*t++ = c; 
*t++ - c ; 

Using CHARAT macros, the preceding program can be made to operate correctly 
on single- and multi-byte data: 

#include <ctype.h> 
#include <nl_ctype.h> 

char *f, *t; 
int c ; 

while ( c = CHARADV(f) ) 

if ( c > 255 || isprinfc(c) ) 
WCHARADVCc, t) ; 
WCHARADVCc, t) ; 

Note that isprintO is defined for single- byte codesets only and that all 
multi-byte characters (c > 255) are considered printable. 



Note Although these macros seem transparent, there are some 

cautions that must be observed when using them. 

■ First, they cannot determine byte status for an arbitrary 
byte within a string. In general, multi-byte strings must be 
examined sequentially from the beginning. 

■ Second, the macros are not perfect analogs of the byte 
pointer versions. In particular, the program sequence: 

*t++ = *f + +; 

cannot be done as: 

WCHARADV (CHARADV (f ) , t) ; 

It must be done as: 

int c; 

c = CHARADV (f) , WCHARADVCc, t) ; 

■ Using the macros will increase program size and reduce 
performance. For example, when *t++ = *f ++ is converted 
to CHARAT macros, it generates about 350 bytes of additional 



Developing International Software 4-5 



code. Where size is a problem, the function versions of the 
macros can be used at some reduction in performance. 

■ The extent of size and performance impact is application 
dependent. To reduce this impact, a common strategy for 
processing multi-byte character data is to use byte pointer 
operations where character interpretation is not an issue, and 
to use multi-byte routines only where needed. 



See NL- TOOLS- 1 6(SC) in the HP-UX Reference for more information on 
programming with multi-byte characters. 

Programming with Wide-Characters 

For some applications, character processing may be more convenient if 
multi-byte characters are represented as constant width characters — so-called 
wide-characters. 

For such situations, a set of routines is available to convert between multi-byte 
characters and wide-characters. The wide-character representation is more 
convenient for some things, for example, pointer manipulation works without 
the need for the FIRSTof 2 type of macros. 

However, it is less convenient for others. For example, multi-byte string 
manipulation routines such as strcollO and print f () do not work for 
wide-character strings. 

The "copy printable characters" example, written to use wide-characters, would 
appear as the following: 



4-6 Developing International Software 



#include <ctype.h> 
#include <stdlib.h> 



char fm ENM] , tm[NM] ; 
wchar.t f w [NW] , tw[NW] ; 
wchar_t *f = fw, *t = tw; 
int c ; 

mbstowcs(iw, fro, NW) ; /* convert multi-byte to wide-character/ 
while ( c = *f++ ) 

if ( c > 255 || isprint(c) ) 
*t++ = c; 
*t++ = c; 

wcstombs(tm, tw, NM) ; /* convert wide-character to multi-byte/ 



Note ■ The issue of printable characters is handled as above. 

■ Error-checking the conversion between multi-byte and 
wide-character data is omitted. 

■ NM and NW are assumed to be appropriately defined. 



Conversion of Existing Programs 

When internationalizing an existing program, conversion to preserve data 
integrity is conveniently done in two steps: 

1. Conversion to single- byte data. 

2. Conversion to multi-byte data. 

Conversion to single-byte data can be subtle. Some programs use the 8th bit as 
a flag to indicate special treatment of the 7-bit character. In general, it may 
not be easy to determine whether a program does this. In any event, programs 
that use or remove the 8th bit must be changed. If the 8th bit is used for data, 
it will be necessary to put the 8th bit data in a new data structure and it may 
be necessary to design a new algorithm to access the new data structure. 

Once a program is correct for single-byte data, the conversion to multi-byte 
data is straightforward. No structural changes are needed, but proper handling 
of multi-byte characters is needed. 



Developing International Software 4-7 



As we saw above, for example, multi-byte data cannot be tested with 
isprintO. In general, it is necessary to examine each instance of byte 
processing to determine whether special handling of multi-byte data is needed. 



Character and String Processing 

Character and string processing for international software must ensure that 
local customs are observed regarding such things as 

■ Treatment of accented characters 

■ Formatting of date and time 

■ Formatting of numeric and monetary quantities 

■ Comparison of string data 

Most character and string processing is provided by internationalized 
library routines that give correct results for the currently active locale. 
Note that there may be restrictions in the use of some library routines and 
minor program changes may be needed. You can find more information in 
NL.TOOLS-16(3C) in the HP-UX Reference. 

The ctype(SC) routines isalphaO, isupperO, etc. and the conv(SC) routines 
toupperO, etc., are internationalized and give locale-sensitive results. 
However, they are defined only for single-byte data and cannot be used for 
multi-byte data. 

The numeric formatting routines ecvt, gcvt, strtod, atof , printf , 
fprintf , etc., have been internationalized and give locale-sensitive results for 
single-byte and multi-byte data. For information about restrictions on the 
use of multi-byte data, see ecvt(SC), strod(SC), and printf '(SC) in the HP-UX 
Reference, Section SC. 

The ctime(SC) date and time routines ctimeO and asctimeO always give C 
locale results. To get locale-sensitive results use nl_cxtime(), nl_ascxtime() , 
or strf time() . 

Generalized monetary formatting is more involved than numeric formatting 
since in some countries the currency symbol is placed before the amount. In 
other countries the currency symbol is placed after the amount. There are no 



4-8 Developing International Software 



library routines that provide monetary formatting; you will have to provide 
your own. 

The currency symbol and position information is available in the structure 
returned by localeconv and can be used as: 

#include <locale.h> 

struct lconv *lcs; 
float number; 
char *cs_p, *cs_f ; 

lcs = localeconvO ; 

number = . . . 

if ( number >= 0 && lcs->p_cs_precedes == '1' II 
number < 0 && lcs->n_cs_precedes == '1' ) { 
cs_p = lcs->currency_symbol ; 
cs_f = ""; 
} 

else { 

cs_p = ""; 

cs_f = lcs->currency_symbol; 
} 

printf("'/.s °/,6.2f %s\n" , cs_p, number, cs_f ) ; 

Other information in the lconv structure describes decimal point, thousands 
separator, spaces used with the currency symbol, etc. 

The string(SC) string comparison routines strcmpO and strncmpO always 
give C locale results. To get locale-sensitive results use strcollO or 
nl_strncmp() . 

For some applications, a performance improvement may be obtained by using 
strxfrmO to convert strings to a form that can be compared using strcmpO. 
The following program illustrates this application: 



Developing International Software 4-9 



char *sl, *s2, *tl, *t2; 
int nl , n2 ; 



strxfrm(sl, tl, nl) ; 
strxfrm(s2, t2, n2) ; 

if ( strcmp(tl, t2) > 0 ) { /* == strcolKsl, s2) */ 
Note that error checking the conversion by strxf rm is omitted. 



Conversion of Existing Programs 

Conversion of existing programs is necessarily an ad hoc process. The grep 
command can be used on existing source code to find calls to routines, such as 
ctimeO and strcmpO. which may require changes. 



Creating and Using a Message Catalog System 

The HP-UX message catalog system allows program messages to be stored 
separately from the logic of the program, to be translated into different 
languages, and to be retrieved at run-time, according to the language 
requirements of each user. 

Program messages might be: 

■ Information to the user, e.g. file not found 

■ Responses from the user, e.g. tomorrow as used by the at command 

■ Strings used to format other messages, e.g. %l$d %2$s\n 

These messages would ordinarily appear in the source program as quoted 
strings, such as: 

pr .int f ("file not f ound\n " ) ; 

if ( strcmp(s, "tomorrow") == 0 ) ... 

To produce a program that is internationalized for messages, do the following: 



4-10 Developing International Software 



■ Separate the program logic from program messages by using message routine 
calls in place of quoted messages in the source program. The message 
routines will retrieve message text at run-time. 

■ Create a message text source file for localization. This file contains messages 
that would ordinarily appear as quoted strings in the source program. 

■ Generate a message catalog from the message text source file. This file 
contains messages that are retrieved by the message routines. 

Localized messages can then be provided by translating the strings in the 
message text source file into another native language and then generating the 
native language message catalog. 

Programming for Messages 

The programming tools for messaging are: 

■ The gene at command, which produces a message catalog from message text 
source files. 

■ The cat open function, which locates a named message catalog and prepares 
it for use by catgetsO and catcloseO . 

■ The cat gets function, which retrieves messages from a message catalog 
opened by a call to catopenO . 

■ The cat close function, which closes a message catalog opened by 
catopenO . 

Opening a Message Catalog 

Message catalogs are opened by the catopenO routine: 
#inelude <nl_types.h> 

nl_catd catd; 

catd = catopen ("name" , 0) ; 

where the name argument identifies the catalog to be opened. If catopenO 
can successfully open the identified catalog, it returns a message catalog 
descriptor. Otherwise it returns (nl_catd)-l. The program can test this 



Developing International Software 4-11 



return value and take an appropriate action if the requested catalog cannot be 
opened. 

Note ■ The catalog descriptor catd is used by catgetsO and 

consequently it must be accessible to every catgetsO call. 

■ It is recommended that the program name be used as the 
name argument. 



Search Path and Naming Conventions 

The names of message catalogs and their location in the file system can vary 
from one system to another. Individual applications may choose to name or 
locate message catalogs according to their own special needs. 

The flexibility to allow general location and naming of message catalogs is 

provided via the NLS environment variable NLSPATH which gives both the 

location of message catalogs and the naming conventions. Message catalog 

naming conventions can be defined by means of substitution field descriptors i 

that permit the use of run-time information. For example: V 

NLSPATH=/usr/local/lib/ , /.L/'/.N . cat : . /'/.N 

This specifies two paths, separated by : , to be searched for a message catalog. 
The meta character, %, in a search path introduces a substitution field 
descriptor, where %N is replaced by the name parameter passed to cat open (), 
and °/,L is replaced by $LANG. 

Thus, for the above value of NLSPATH, the call catopen(prog, 0) will first 
attempt to open /usr/local/lib/$LANG/prog. cat. Failing this, it will 
attempt to open ./prog. Note that if LANG is not set, the first path would be 
/usr/local/lib//prog. cat and would probably result in a failure to find a 
catalog. 

If catopenO can't find a message catalog with the path names specified in 
NLSPATH, it searches the default path: 

/usr/lib/nls/'/.l/'/.t/'/.c/'/.N . cat 

where: °/ 0 l is replaced by the language element of LANG, %t is replaced by the 
territory element of LANG, and °/ 0 c is replaced by the codeset element of LANG. 
This is summarized in the following table: 



4-12 Developing International Software 



Table 4-1. Summary of NLSPATH Replacement Specifiers 



Replacement 




Specifiers 


Expansion by NLS 


7.L 


replaced by the value of LANG 


7.N 


replaced by the name of the application 


7,1 


replaced by the language element of LANG 


%t 


replaced by the territory element of LANG 


7,c 


replaced by the codeset element of LANG 



For further details on LANG and NLSPATH, see environ(5) in the HP-UX 
Reference. 

Retrieving Messages 

Once the message catalog is open, the program can retrieve messages from the 
catalog using: 



catgets (catd, set-num, msg-num, defstr) ; 

where catd is the catalog descriptor returned by catopenO, set-num and 
msg-num identify the message to be retrieved, and defstr ("default string") is 
a string that is returned if the call fails. 

Ordinarily defstr is the C locale message. 

To retrieve messages, catgets () uses an internal buffer that is overwritten 
on each call. This is rarely a problem since a message is ordinarily used 
immediately by being printed or tested. However, see "Special Considerations 
for Messaging" below. 

Closing a Message Catalog 

When the program no longer needs access to the message catalog, the 
catalog file should be closed. This can be done with the catcloseO call but 
it is generally simpler to let exit close the catalog file when the program 
terminates. 



Developing International Software 4-13 



Default Messages 



A program should make provision for the case when the message catalog is not 
available. This could happen, for example, if the file system containing the 
catalog is not mounted or if there is no catalog for the current language. Note 
that c at op en 0 does not take a default action if a catalog cannot be opened. 
Provisions for default messages must be arranged by the program. There are 
two general strategies for handling this situation: 

■ The "standard" method is to include the default message as the defstr in 
the catgetsO call. If the catopenO call fails, it will return (nl_catd) -1, 
an invalid file descriptor. This will subsequently cause catgetsO to fail, and 
it will return defstr, the default message. This is the recommended method 
of handling default messages. 

■ Alternatively, you can use a default message catalog. Note that even 
the default message catalog may not be available (e.g., if the file system 
containing it were not mounted). Commands using this method should 
consider the probability of this situation for their application and plan 
accordingly. Applications that use this method often use error message 
numbers as the default string in catgetsO calls. 

If a message catalog is missing, it is seldom useful to issue a message unless 
it is reasonable to expect the catalog to be available. If a message catalog is 
missing and the catalog is critical to the successful execution of the program, it 
may be best to issue a message and terminate the program. 

Compiling and Linking 

There are no special requirements for compiling and linking. All messaging 
routines are in standard libraries and will be linked with the usual compile/link 
commands. 

Creating a New Message Catalog 

Creating a message catalog is a two step process: 

1. Create the message text source file. 

2. Use gene at to generate a message catalog from the message text source file. 



4-14 Developing International Software 



The Message Text Source File 



A message text source file contains the messages from the source program. 
Each message is numbered with the message number used in the corresponding 
cat get s() call. 

A simple message catalog text file might be: 

$ Comment: a simple message text source file 

1 text for message 1 

2 text for message 2 

A message consists of a message number followed by a single space or tab 
followed by the message text and terminated by a new-line. The message text 
is a C string, including spaces, tabs and \ (backslash) escapes, but without 
surrounding quotes. Message numbers are unsigned integers and must be in 
ascending order but need not be consecutive. A line beginning with $ followed 
by a single space or tab is treated as a comment. Note that comments in 
the message text source file are not saved in the message catalog created by 
gencat. 

For a large or complex group of messages it may be useful to arrange the 
messages into groups called sets. Message sets allow the programmer to 
group similar messages together within a catalog. For example, one set might 
contain all prompts, and another set might contain all error messages. A 
set is introduced by a $set directive. Messages belong to the set specified 
by the most recently appearing $set directive. Like message numbers, set 
numbers are unsigned integers and must be in ascending order but need not be 
consecutive. Message numbers in different sets are independent. 

A default set, NL_SETD is defined in <nl_types .h> for use in source programs. 
If a $set directive does not appear in the message text source file, messages 
will be assigned to set NL_SETD . Using the default set and directives in the 
same message text source file is not recommended. 

A message text source file with sets might look like the following: 



Developing International Software 4-15 



$ user prompts 
$set 100 

1 Text of message number 1 
4 Text of message number 4 
9 Text of message number 9 

$ error messages 
$set 200 

1 Text of message number 1 
3 Text of message number 3 

To make leading or trailing blanks visible, the $quote directive can specify a 
quote symbol. For example: 

$ show blanks 
$quote " 

1 " leading blanks" 

2 "trailing blanks " 

For more details on the format of the message text source file see gencat(l). 
Compiling a Message Catalog 

Once the message text source file is correct, a message catalog can be 
generated. For example, if prog.msg contains the messages for prog.c, then 
you would type the following: 

gencat prog. cat prog.msg 

This generates prog. cat, a message catalog for prog. c . This step is 
analogous to compiling the source program: the message text source file is 
"compiled" into a binary message catalog for use by the program at run-time. 



( 



4-16 Developing International Software 



An Example of Programming with Message Catalogs 

To see how this all fits together, suppose prog . c is the standard sample 
program: 

main() 
{ 

printf ("hello world\n") ; 
} 

When converted to use message catalogs, prog . c would look like this: 

#include <nl_types.h> 

main() 

i 

nl.catd catd; 

catd = catopen ("hello" , 0); 

printf (catgets (catd, NL.SETD, 1, "hello world\n") ) ; 
} 

The message text source file would be: 

$ message catalog for hello world 
1 hello world\n 

The program would be compiled as: 

cc -o prog prog.c 
and the message catalog would be generated as: 

gencat prog. cat prog.msg 
For this example, 

■ We have used "standard" default message handling: default messages are the 
default strings in catgets () calls, and these will be returned as messages if 

c at op en () fails. 

■ The program name is also the message catalog name so that catopen () will 
search the standard places when looking for a message catalog. 

■ The default set, NL_SETD, is used in the source program and the use of a set 
directive in the message text source file is omitted. 



Developing International Software 4-17 



Special Considerations 

■ Messages in variables require special treatment. For example, the message in: 

char *msg = "message"; 

printf (msg) ; 

would, given a "direct" conversion, result in: 

char *msg = catgets( catd, set-num, msg-num, "message") ; 

printf (msg) ; 

This would generate a compile error. The required conversion is: 
char *msg = "message"; 

printf (catgets( catd, set-num, msg-num, msg)); 

■ Messages in arrays require somewhat more elaborate treatment. Before 
conversion, an original source might contain the following: 

static char *msg_tbl[] = { 
"message 1", 
"message 2", 

"message N" 
>; 

printf (msg_tbl [i] ) ; 
This would need conversion to: 

printf (catgets C catd, set-num, msg-num, msg_tbl [i] ) ) ; 

and set-num, msg-num and message index i must be synchronized. In 
particular, note that msg_tbl [0] is message 1 and that 0 is not a valid 
message number. 

■ Multiple messages in a printf call might appear as: 

printf ("message 1" , "message 2") ; 

But, because catgetsO overwrites its message string on each call, these 
cannot be translated as: 

printf (catgetsC catd, set-num- 1, msg-num- 1 , "message" 1"), 

4-18 Developing International Software 



cat-gets (catd, set-num-2, msg-.num-2, "message 2")) ; 
For this situation it is necessary to copy one of the messages: 
char *ml [N] ; 

strcpy(ml, catgets ( catd, set-num-1, msg-num-1, "message 1")); 
printf (ml , catgets {catd, set-num-2, msg-num-2, "message 2")); 

Both catgets () and gene at () impose limits on the length of messages 
they can handle. These limits may make it necessary to compose a large 
message, such as a help screen, from several smaller messages. For further 
information, see catgets (SC) and other references in HP-UX Reference. 

The message system makes no provision to ensure that the correct catalog 
is used with a program. If an incorrect version of a message catalog is 
inadvertently installed, your program will issue messages but they will 
probably not make sense. You may wish to add validation messages that 
contains the program revision code and the locale so the program can 
validate the message catalog it uses. This could be done as the following: 

char *p_rev = /* program revision */ 

"$Revision: 1.4 $"; /* catgets 1 */ 

char *c_rev; /* catalog revision */ 

char *p_loc = /* program locale */ 



"C" ; /* catgets 2 *■/ 

char *c_loc; /* catalog locale ■*/ 

c_rev = catgets (catd, NL.SETN, 1, p_rev) ; 
if ( strcmp(c_rev, p_rev) != 0 ) { 

printf ("program/message catalog revision mis-mat ch\n") ; 

catd = (nl_catd) -1 ; 

} 

p_loc = getenvC'LANG") ; 

c_loc = catgets (catd, NL_SETN , 2, p_loc) ; 
if ( strcmp(c_loc , p_loc) != 0 ) { 

printf ("program/message catalog locale mis-match\n") ; 

catd = (nl_catd) -1 ; 

} 

This example uses an rcs(l) $Revision$ line (see discussion in co(lj) so 
that the revision code can be updated automatically. The special comments 
/* catgets 1 */ and /* catgets 2 */ enable findmsg to find the validation 



Developing International Software 4-19 



messages. See the discussion in the "Source Program Management" section of 
this chapter. 

The message text source file for this program would contain: 

1 $Revision: 1.4 $ 

2 C 

Note that both of these messages are potential problems for someone 
attempting to localize the program. Message 1, the revision line must not 
be localized. Message 2, specifying the locale, must be localized but the 
translation is not obvious to someone unfamiliar with the program. Comments 
in the message text source file won't help since they are not saved in the 
message catalog. See "Guidelines for Using Messaging" below for a description 
of a "cookbook" to help the translator avoid errors. 

Libraries with Messages 

Library routines as well as programs, can use message catalogs. For example, 
the C library routine perror(SC) uses a message catalog and can be used by a 
program that also uses a message catalog. All the considerations for programs 
apply to libraries. There are also some special considerations. 

In general, the scope of variables of a library routine are restricted to the 
routine so they do not conflict with variables of the main program. The catalog 
descriptor must be declared so that there can be no conflict with the main 
program since the main program may also use a message catalog. 

Since a library routine might be called several times by a program, some 
consideration should be given to the way the message catalog file is opened. 
There are two general strategies: 

■ The easy strategy is to open the catalog when it is needed and close it after 
use. This uses a file descriptor only when it is needed. 

■ For cases in which the library routine is called frequently, it may be desirable 
to avoid multiple opens/closes of the catalog. This can be done with the 
following: 

static nl_catd catd; 
static int oflg = FALSE; 



4-20 Developing International Software 



if ( ! oflg ) { 
oflg = TRUE; 
catd = catopen( ... ) ; 
} 

catgets (catd, set-num, msg^num, def _str) ; 

Once open, the file descriptor remains in use for the remainder of the program. 
The catalog will be closed by exit at program termination. Note, however, 
that this method cannot be used if LANG can change between calls to the 
routine. 





prog, c 










findstr 

V ) 




r 




prog.str 







edit to remove 
non — message strings 



insertmsg 



nLprog. c 


\ 




c 




compile 




J 



edit to add 
other NLS routines 



prog, msg 



edit 
messages 



gencat 



prog 



messages 



prog, cat 



Figure 4-1. Converting a Non-Internationalized Program 

Developing International Software 4-21 



Conversion of Existing Programs for NLS Messaging 

The conversion of an existing program to use messages can be automated to a 
substantial degree. Consequently, even when writing a new program you may 
find it easier to write the program without messages and, when it is working, 
convert it to use messages. 

The conversion process is: 

1. Find all quoted strings in the source program. Such strings may be 
messages. 

2. Review the list of quoted strings and remove any that are not messages. 

3. Assign a message number to each message string and replace the string in 
the source program by an appropriate call to catgetsO. 

4. Generate a message catalog from the numbered message strings. 
HP-UX commands are available that make this process fairly easy. 

1. Finding Strings in a Program 

The command f indstr will examine a C source program and find all string 
constants (other than those that appear in comments). These strings, along 
with their quotes, are written to standard output along with information 
indicating the position of the string in the source file. A typical use would be: 

f indstr prog.c > prog. str 

The string file prog . str would now contain a copy of each string found in 
prog.c. 

The f indstr command expects the strings of your program to be syntactically 
correct with the quotes properly matched. To ensure that this is the case, it is 
a good policy to use f indstr only on tested programs. 

2. Removing Non-Messages from the Strings 

Most of these strings in the string file are messages and would need to be 
localized. Some of the strings, however, would never be localized. For example, 
the type specifier for fopenO is a string such as 11 r" or "w+". These strings 
are not messages and would not be localized. Some format strings would be 



4-22 Developing International Software 



localized but some would not. The string file must be reviewed and any entries 
for non-message strings should be removed. 

When editing the string file, take care not to modify the location information 
for strings that are left in the file. Also, note that if the source file is changed, 
the string file may be invalidated and should be re- generated. 

3. Inserting Catgets Calls 

Once the string file contains only the strings that will need localization, you are 
ready to create a messaging version of your program. 

This is done using the insertmsg command which takes care of a few 
administrative details: 

■ It assigns a message number to each string in the string file and writes the 
numbered messages to standard out in a format suitable for use by gene at. 
This is the message text source file for the program. 

■ It creates a copy of the source program in which each string identified in the 
string file is replaced by a catgets () call with the assigned message number. 
The name of the new source program file is the name of the original source 
file with the prefix nl_. 

A typical use would be: 

insertmsg prog str > prog.msg 

If the prog, str file were created from the source file prog .c, the new source 
file would then be nl_prog. c. 



NotG The f indstr or the insertmsg command will not recognize 

the problem cases identified in the "Special Considerations" 
section in this chapter, and they will convert them without 
comment. Some of these conversions will draw a syntax error 
from the compiler; others will give incorrect results with no 
indication. The recommended strategy is to let the compiler 
find the syntax errors and to review the remaining conversions. 



Developing International Software 4-23 



4. Editing the Modified Source Program 

Your new source program will need some minor editing before it can be used. 
A string such as: 

. . . "string" . . . 

in the original source file, would have been changed to: 
... catgets(catd, NL_SETN, msg^num, "string") ... 

The msg-num was assigned by insertmsg. You must provide definitions for 
catd and NL_SETN. This can be done by adding the following lines near the 
beginning of the program: 

#include <nl_types.h> 
#define NL.SETN 1 

nl_catd catd; 

catd = catopen (" name" , 0) ; 

The catopen () call would ordinarily be part of the "standard" initialization. 
See the section "Initializing NLS" in this chapter for additional information. 

After these modifications, the new source program can be compiled and linked. 

5. Editing the Message Text Source File 

For many cases, the message text source file, prog.msg in the above example, 
will need no modification. However, if you are using sets, appropriate $set 
directives must be inserted. 

6. Creating a Message Catalog 

After any changes to the message text source file, the message catalog can be 
created using gene at. As in the earlier example: 

gencat prog. cat prog.msg 



4-24 Developing International Software 



Testing a Message Catalog 

Once you have an executable program and a message catalog, you can test the 
program to be sure that it retrieves messages from the correct message catalog. 

If you used the "standard" message initialization, the use of NLSPATH makes 
testing easy. For the following example, we assume: 

■ The executable program is named prog. 

■ The catalog is opened by catopen("prog" ,0). 

■ The original messages are in prog . msg. 

■ The default value for LANG is null, i.e., unset. 

The following script prepares test directories and catalogs: 

# make a directory 
mkdir ./french 

# make a copy of the message text source file 
cp prog. msg french. msg 

# modify the messages to distinguish 

# default messages from catalog messages 
vi french. msg 

# generate a message catalog with the modified messages 
gencat french/prog.cat french. msg 

The catalog in directory . /french is now ready for testing. 

The following script tests the program for default messages and "french" 
messages. 

# set NLSPATH 

NLSPATH=./7,L/7.N.cat ; export NLSPATH 
echo $ NLSPATH 

# test the default messages 
echo LANG = $LANG 

prog 

# test the catalog messages 
LANG=french ; export LANG 
echo LANG = $LANG 

prog 



Developing International Software 4-25 



Installing a Message Catalog 



When you are satisfied that your messaging program correctly accesses 
its message catalog, it can be installed. See "Administering International 
Software" for more details. 

Source Code Management 

Following are some suggestions and comments on the management of messaging 
source programs. 

Keeping nLprog.c Files 

There are two approaches regarding the modified source files: 

■ You can rename the nl_* files to the original names and keep the modified 
version as the source program. This is the more commonly used approach. 
It eliminates need for reconversion but means the source files have the 
catgetsO calls in them and are more awkward to read. 

■ Or you can keep the original source files and convert them whenever they 
are modified. This eliminates the need to read the messaging statements 
but means the source files must be converted whenever a change is needed. 
This approach may be feasible only if editing of the string file and converted 
source files is minimal or can be automated. 

Multi-file Programs 

If your program consists of a number of files, the conversion process is only 
slightly more complex than for a single file. The f indstr, insertmsg, and 
gencat commands all take multiple file input and perform appropriately. For 
more information, please see the appropriate pages in HP-UX Reference, 
Section SC. 

Adding a Message to a Messaging Program 

Once your program provides message catalog support, you may need to add a 
message to the program. If you keep the original version of the source program 
(without the message catalog calls), adding a new message is done simply by 
adding the message to the source program and converting the program as 
above. 



4-26 Developing International Software 



If you keep the nl_* version of the source program (with the message catalog 
calls), adding a message means that you must assign a message number to the 
new message and this new number must not conflict with those already used in 
the message catalog. To assign new message numbers, you will need a list of 
existing message numbers. These are available from two places: in the message 
catalog and in the source program. 

The dumpmsg command will list the messages in a message catalog: 
dumpmsg prog. cat >prog.msg 

If there are multiple versions of the program, be sure that the message catalog 
and the source program are for the same version. 

The findmsg command will list the messages in a source program: 
findmsg prog.c >prog.msg 

This method is generally preferred since it ensures that the message text 
source file, agrees with the source program. The messages found are the quoted 
strings in catgetsO calls in the source program. If a program uses messages 
in variables, you must add special comments to the source program so that 
findmsg can find these messages. For example, a message in a variable and its 
corresponding catgetsO call would look like the following: 

char *msg = "message" ; /'* catgets msg_num */ 
printf (catgets (catd, set-num, msg-num, "message")); 

Both of the message listing commands produce as output, a message text 
source file in a form suitable for input to gencat. 

Once a message list is available, message numbers can be assigned to new 
messages and the source program appropriately modified with new catgetsO 
calls that have the newly assigned message numbers. The new messages can 
then be added to the message text source file and a new message catalog 
generated. 

Although gencat can merge new messages into an existing message catalog, it 
is just as easy and less error prone to re-create the complete message catalog. 
Once the new catgetsO calls have been added to the source program, this can 
be done as the following: 



Developing International Software 4-27 



# remove previous message catalog to preclude update 
rm -f prog. cat 

# generate a message text source file with the new messages 
findmsg prog.c >prog.msg 

# generate the new catalog 
gencat prog. cat prog.msg 

# list the new messages for review 
dumpmsg prog. cat 



Using "make" Files 

With the . msg and . cat file suffix conventions, it is possible to use make to 
automate message catalog creation. The following make file illustrates the 
procedure: 

SOURCE = prog . c sub . c . . . 



all 



prog prog. cat 



prog.msg $ (SOURCE) 

findmsg $ (SOURCE) >$Q 

.msg. cat : 

gencat $*.cat $*.msg 

The command: 
make prog.msg 

will generate the message text source file prog.msg. The command: 
make prog. cat 

will generate the message catalog prog. cat from the message text source file 
prog . msg. Also see "Example 2" , in "Appendix A" of this manual for more 
illustration of this procedure. 



Guidelines for Using Messaging 

Here are some overall guidelines which you should keep in mind when 
programming for messages. 

■ Provide a "cookbook" for the translator which contains the numbered 
messages and, carefully separated (e.g, by brackets), any additional 



4-28 Developing International Software 



explanatory information or paraphrase they may need. A message that is 
obvious to you may be a mystery to a translator. You should assume that 
the translator: 

1. Has a different native language from yours. 

2. Is hundreds or thousands of kilometers away from you. 

3. Is doing the translation months or years after you finish the program. 

All text that needs to be localized should be put in the message catalog. 
This includes: prompts, help text, error messages, format strings, softkey 
definitions, and command names. 

Any text that will not be localized should not be put in the message catalog. 
Including unnecessary text will not affect the program behavior but it may 
be confusing to a translator. 

Provide a unique, unambiguous message for each situation. A single message 
in your own language may appear to cover several different situations. 
However, when the message is translated into another language, each 
different situation may require a different local language translation. 

Allow at least 60% extra space in text buffers and screen layouts to allow for 
text expansion when messages are translated. It may take more space to 
convey information in another language. 

Decide what to do if a message catalog cannot be found by your program. If 
the local language is vital to the operation of the program, you may want the 
program to issue a default error message and exit. If the local language is 
not vital to this part of your program, you might allow the the program to 
continue to operate with a default language (such as C). 



Developing International Software 4-29 



( 



Administering International Software 



5 



Read this chapter if you are: 

■ A Systems Administrator who supports the use or development of NLS 
software. 

This chapter covers information you will need to know and tasks you will need 
to perform in order to ensure that users on your systems are able to use NLS 
features successfully. 

Both the information and the tasks are minimal since your local NLS 
Coordinator should have already determined the required configuration and 
initialization of the system with respect to NLS. 



Finding NLS Files 

The NLS information used by HP-UX commands and libraries is located in the 
following directories and files: 



Administering International Software 5-1 



Directory /Files 
/usr/lib/nls 



Type of NLS Information 

This is the directory under which NLS 
information is located. 



/usr/lib/nls/conf ig 



This readable ASCII file identifies currently 
installed locales, including user-defined locales 
created by buildlang. It contains locale 
names and their corresponding locale-ID 
numbers. 



/usr/lib/ rils/ locale 



This directory is present for each installed 
locale. 



/usr/lib/nls/ locale / locale, def 



These files contain locale-dependent 
processing information. 



/usr/ lib /nls /locale/ *. cat 



These are the localized message catalog files. 



In the most general case, locale can be of the form: language-territory. codeset. 
Either of the extensions -territory or .codeset may be omitted if not applicable, 
and in general, both are omitted. If a locale has -territory or .codeset 
extensions, there is a corresponding subdirectory for each extension. For 
example, if /usr/lib/nls/conf ig has entries: 

german . 8859 
german_swiss 
german_ swiss . 8859 

Japanese 
japanese.ujis 

Then, you should expect to find the following directories: 

/usr/lib/nls/german/8859 
/usr/lib/nls/german/ swiss 
/usr/lib/nls/german/swiss/8859 
/usr/lib/nls/ j apanese 
/usr/lib/nls/ j apanese/uj is 



5-2 Administering International Software 



The Default User Environment 



The NLS environment variables should have system default values appropriate 
to the local user community. These values would ordinarily be determined by 
the local NLS Coordinator. You should include commands in /etc/profile 
and /etc/csh . login that will set the user's environment variables to these 
default values. Note that HP-UX does not set these variables. 



Terminal Configuration 

Users running internationalized commands will be using the following setting: 
stty -istrip -parity 

The /etc/gettydef s file for these users should be set properly for their 
terminal. 



Installing Message Catalogs 

Localized message catalogs would ordinarily be delivered to you by the local 
NLS Coordinator. You should install these catalogs in the appropriate location. 

Message catalogs for HP-UX commands and libraries are located in 
/usr/lib/nls/ locale. If your system has territory or codeset specific locales 
you will need to check additional directories. See discussion in "Finding NLS 
Files" above. 

Message catalogs for other applications can be put in any location that can 
be referenced by the conventions of catopen and NLSPATH. The location and 
naming of local message catalogs will generally be made by you in consultation 
with the local NLS Coordinator. This location and naming may require a 
change to the system default value of NLSPATH. If it does, the NLS Coordinator 
will determine the new value. You will need to make the required change to the 
NLSPATH setting in /etc/profile and /etc/csh. login and you will want to 
notify users of this change. 



Administering International Software 5-3 



Installing Optional Locales 



The procedure for installing additional software such as an NLS locale is 
explained in detail in the section "Updating HP-UX" of UP-UX System 
Administration Tasks. 

HP-UX is shipped with the default locale, C. For specific locations, other 
locales may also be shipped. If you install other products, however, you must 
order the specific locales for them as an additional option. Not all character 
sets are supported on all peripherals, so peripherals which support the desired 
character set must also be obtained. After a locale is installed, the NLS 
locale-specific information can be used by any application program requesting 
it. 



Peripheral Configuration 

When you purchase peripherals for use in a non-ASCII or multiple language 
environment, you should consider the character sets that your peripheral(s) will 
need to support. Hewlett-Packard provides printers, plotters and terminals 
which support HP single- and multi-byte character sets, as well as non-HP 
standards (such as the IS08859-1 character set for Europe). In some cases, you 
may need special software in order to operate these peripherals, such as the 
NLIO system for Asian peripherals. 

Because of these considerations, the information below is provided to help in 
understanding the special characteristics of non- ASCII peripherals. For further 
information, you can contact your local HP sales representative for assistance. 

European Character Sets 

For European languages, many HP peripherals support the ROMAN8 character 
set. ROMANS is a full superset of ASCII and offers 88 additional local 
language symbols. Older HP peripherals may use the HP Roman Extension 
set, which is a subset of ROMAN8. Roman Extension is missing ROMANS 
characters A through I, U, U, Q, ¥, §, /, and A through ±. 

ROMAN8 terminals can simultaneously display any characters in the set. The 
keyboards have keycaps only for the specified local language, but, in the 8-bit 

5-4 Administering International Software 



mode, you can enter any ROMAN8 character by use of the (Extend char) key. You 
can also use most 8-bit terminals in IS07 mode. 

Katakana Character Sets 

Many HP peripherals support a base 8-bit character set known as KANA8. 
The first 128 codes in the KANA8 set are JASCII (the same as ASCII except 
that the set substitutes "¥" for "\" ) , and the last 128 codes are available for 
Katakana. 

Other 8-bit HP Character Sets 

As with KANA8, the other 8-bit character sets supported by HP have ASCII 
as the first 128 codes, with the last 128 codes used by other characters. Some 
Arabic printers are capable of context-sensitive letters, so some character 
shapes may vary on these devices. 

16-bit HP Character Sets 

For Asian languages, many HP peripherals support one of five HP- 16 character 
sets. These character sets are compatible with the five HP- 15 character sets 
(PRC15, ROC15, JAPAN 15, UJIS, and KOREA 15). NLIO is required for 
converting between HP-15 and HP-16 during input and output. NLIO is also 
necessary with some Asian terminals to provide the "input method" by which 
a user can input multibyte characters using a conventional keyboard. Certain 
peripherals, such as PC's used as terminals, can generate and display HP-15 
multibyte characters directly and need no additional software. 

Non-HP 7-Bit Character Sets 

The IS07 (International Standards Organization 7-bit character substitution) 
and similar character sets have certain infrequently-used ASCII codes, such as 
those for "|" and "{", designated to generate local-language symbols. Examples 
are the 0 or ae in Danish. Unfortunately, the designated ASCII codes also 
represent special characters often used in HP-UX (and all other UNIX and 
UNIX-like systems). For this reason, the use of ISO 7-bit, and similar non-HP 
international character sets is neither recommended nor supported. 



Administering International Software 5-5 



Limited support for non-HP 8-bit character sets may be provided through 
appropriate language definitions. Currently this definition must be provided 
by the user. The buildlang utility described in the chapter "Localizing 
Internationalized Software" , in this manual, provides help in defining your own 
language and locale characteristics. 



5-6 Administering International Software 



Localizing International Software 



6 



Read this chapter if you are: 
■ A local NLS Coordinator. 

The chapter covers information and tasks for localizing commands that have 
been internationalized. It will also help you in determining local NLS needs 
which you may need to communicate to your System Administrator. 



Localizing the User Environment 

HP-UX does not automatically set NLS environment variables. HP-UX 
commands, when run with NLS environment variables not set, default to the 
C locale. If this is the desired system default locale, no changes for the user 
environment are needed. 

To provide a different system default locale, you will need to specify the desired 
default values for the NLS environment variables: 

■ LANG 

■ LC_categories 

■ NLSPATH 

■ LANGOPTS 

The chosen values should be those most commonly used. The default values 
should be set in /etc/profile and /etc/csh. login. You should arrange with 
your system administrator to do this and advise users of any change to the 
system default. 

Users who need an environment different from the system default can set their 
own environment as needed in their .profile or .login file. 



Localizing International Software 6-1 



Localizing Message Catalogs 

For applications that have message catalog support, you can provide a local 
language interface. This involves: 

■ Obtaining a copy of the C locale messages. 

■ Arranging for translation of the messages into a local language. 

■ Installing a message catalog containing the translated messages. 

The C Locale Messages 

To determine what HP-UX commands have message catalogs, run: 

Is /usr/lib/nls/C/*.cat 

For each HP-UX command that has message catalog support, there will be a 
file /usr/lib/nls/C/ command. cat listed. 

To localize a message catalog, you need to first get a readable version of the C 
locale messages. This is done with the dumpmsg command. 

For example, to get a message text source file of the C locale messages for dat e 
run: 

dumpmsg /usr/lib/nls/C/date . cat >date.msg 

The file date.msg is a copy of the messages and is ready for translation to a 
native language. 

Preparing for Translating Messages 

You are now ready to translate the messages to the target language: 
vi date.msg 

Note that date.msg is a message text source file in a format suitable for input 
to gene at. You must preserve the format and you must leave the message 
numbers and the set numbers unchanged. 

The developer should have provided a translator's "cookbook" . Lacking this, 
here are some possible translation problems you might encounter: 



6-2 Localizing International Software 



■ The meaning of a message may be unclear or ambiguous so that the desired 
translation is not apparent. 

■ There may be unspecified size constraints on the message. For example, it 
may be displayed in a space with a fixed length. 

■ There may be parts of a message that should not be translated. For 
example, messages for a command may contain the command name. 

Some possible solutions you might try: 

■ Experiment with the program to see if you can determine the intended 
behavior. 

a Communicate with the developer of the program. 

■ Communicate with someone who has localized the program. 

Installing Localized Messages 

Once the message text source file has been translated to the target language 
you can generate a message catalog containing the newly translated messages. 
To create a message catalog from the translated date.msg message text source 
file, run: 

gencat date. cat date.msg 

The new message catalog date . cat can now be delivered to your System 
Administrator for installation in the appropriate locale. 

Note that a message catalog contains no information to indicate the locale for 
which it is intended. To help ensure that the message catalog is installed in 
the proper directory, we recommend you deliver the catalog with a script that 
will install the catalog in the correct locale. Once the new message catalog is 
installed, be sure to verify the correct installation. 



Localizing International Software 6-3 



Creating a Locale 

The standard locales cover most languages. In the event that none of the 
existing locales is appropriate, it is possible to create a locale that meets your 
specific requirements. This is most easily done if there is an existing locale 
that is similar to the one you need. If there is, you can get a copy of the locale 
description in buildlang format, modify the description so that it conforms to 
your needs, then install it as a new locale. 

For example, suppose you need a locale that is the same as american except 
that it is to have a different date format. 

For the american locale, date produces output of the form: 
Fri, May 5, 1989 04:37:33 PM 

Suppose the desired format is: 

Fri, 5 May 1989, 04:37:33 PM 

The format for date is controlled by the d_t_fmt and d_fmt items of the 
LC_TIME category. You can change these to give the desired format. 

To create the new locale, get a buildlang script of the american locale by 
executing: 

buildlang -d american > new_locale 

You can now modify the buildlang script new_locale to define the desired 
locale: 

vi new_locale 



( 



6-4 Localizing International Software 



The script will contain the following entries: 

langname "american" 
langid 1 

LC.TIME 

d_t_fmt '7,a, %b '/..Id, 7.Y 7,I:7.M:7.S 7.p" 

d_fmt '7.a, 7,b 7.. Id, 7.Y" 

t_fmt "7 8 I:°/,M: , /.S '/.p" 

day.l "Sunday" 

END.LC 

To get the desired formatting, you need to change: d_t_fmt and d_fmt in the 
script to: 

langname "locale-name" 
langid localeAd 

LC.TIME 

d.t.fmt " 0 /,a, '/..Id °/,b °/.Y 7.I:°/.M:%S °/.p" 

d_fmt "'/.a, */,.ld 7,b %Y" 

t_fmt "7.I:°/.M:y.S '/,p" 

END.LC 

You also need to determine locale-name and locale-id. If you want to create a 
new locale, these must not conflict with existing locales and the locale-id must 
be in the range 901-999. If you want to replace an existing locale with a new 
definition, these must be the locale -name and locale -id of the locale that is to 
be replaced. 

After you have changed new_locale, the locale-name locale can be installed in 
the system by executing: 

buildlang new_locale 

You may need to be root to do this or you can deliver new-locale to your 
System Administrator for installation. 

To verify correct installation of the new locale: 



Localizing International Software 6-5 



Run nlsinf o to see that the new locale is displayed. 

Examine /usr/lib/nls/conf ig to see that locale_name is listed with 
local e_id. 

Verify that a directory /usr /lib/ nls/ 'locale -name exists. 

Verify that a file /usr/lib/nls//oca/e_ name/locale, def exists. 

Set LANG to the locale-name locale and verify that date formats the date 
desired. 



6-6 Localizing International Software 



Advanced NLS Topics 



7 



Read this chapter if you are: 

■ A programmer or software developer who has special requirements 

■ Anyone in need of additional background information on NLS 

This chapter covers the following: 

■ Character and string processing in more detail 

■ Special requirements for localizing 

■ Special situations for messaging 



Codeset Conversion 

If you need to transport data between systems that use different codesets, you 
will probably need to convert codesets. To assist this conversion, two codeset 
conversion tools are available. 

The iconv command operates on files and converts characters from one codeset 
to another. Conversion can be performed between HP codesets and a number 
of widely-used non-HP codesets. See iconvfl ) in the HP- UX Reference for 
details. 

The iconv routines are intended for special situations not covered by the 
conversion command. Using these routines, it is possible to provide special 
conversion tables and special treatment that may be needed in the conversion. 
See iconv(SC) in the HP-UX Reference for details. 



Advanced NLS Topics 7-1 



Processing Right-to-Left Languages 

Processing right-to-left languages requires the programmer to deal with issues 
of data directionality that are not ordinarily a concern. 

Directionality refers to two properties of the text: 

■ The direction the language is naturally read. 

■ The order of characters in a file. 

Mode can be 

■ Latin: left- to-right. 

■ Non-Latin: right- to-left. 

Order can be: 

■ Keyboard: the order in which keystrokes the user enters keystrokes. 

■ Screen: the order in which characters are displayed. 

Some codesets contain Latin and non-Latin characters so that it is possible 
to mix left-to- right and right- to-left text. If we use Li to indicate a Latin 
character, Nz to indicate a non-Latin character, and % to indicate the order in 
which the character is typed, the mixed text: 

Nl N2 L3 L4 N5 N6 L7 L8 

entered on a terminal configured for right-to-left display would appear as: 

L7 L8 N6 N5 L3 L4 N2 Nl 

For additional information on directionality, see hpnls(5) in the HP-UX 
Reference. 

Two commands are available to manage data directionality. The command 
f order allows users with screen data to use programs that do not support 
screen order data. It converts the order of characters in a file from screen 
order to keyboard order, or from keyboard to screen order. For example, sort 
cannot sort screen order data. However, such data could be sorted by: 

forder filel I # put in keyboard order for sort 

sort I # sort it 

forder > file2 # put back in screen order 

Order and mode information is specified by the LANGOPTS environment 
variable. To set LANGOPTS using Bourne Shell or Korn Shell: 



7-2 Advanced NLS Topics 



LANGOPTS=raode_ order 
export LANGOPTS 

For further details on the LANGOPTS environment variable, see environ(b) . 

Since most printers are designed for printing left-to-right languages, printing 
right-to-left data requires special formatting. The command nljust provides 
this special formatting. It aligns such data with the right margin and composes 
the data in right-to-left print order. For example, nljust would typically be 
used as a filter with the lp and pr commands, such as in: 

pr file I nljust - I lp 

As with f order, nljust also gets mode and order information from the 
LANGOPTS variable. 

For special situations that cannot be handled by data ordering commands, the 
routine strord converts between screen order and keyboard order and can be 
used to provide any special processing that may be needed. As a simplified 
example, consider a program that reads data in either keyboard or screen 
order, and writes it to a terminal in screen order. The relevant portions of the 
program are: 

#include <nl_types.h> 
char *lopts; 

lopts = get env(" LANGOPTS ") ; /* "m_o" m = mode, o = order */ 

fscanf( ... , src , ... ); /* read in current mode/order */ 

if ( lopts [2] == 'k' ) /* if order is keyboard order */ 

strord (dst, src, lopts [0]); /* re-order before write */ 

fprintf( ... , dst, ... ); /* write data */ 

For an extended example of right-to-left processing see Appendix A, "Examples 
of Internationalized Software", in this manual. 



Advanced NLS Topics 7-3 



Locale Information 



Locale information is available in various ways. The nlsinf o command 
provides selected portions of information for a specified locale. Information is 
displayed in tabular form convenient for reference. The buildlang command 
-d option provides all information for a specified locale. This information is 
displayed in buildlang input format and may be used to define a new locale. 

Programmatic access to information about the currently active locale is 
provided by three library routines. The langinf o() routine provides access to 
all locale information. The localeconvO routine provides access to the locale 
information that pertains to numeric formatting. The getlocaleO routine 
provides access to setlocaleO status information. See setlocale(SC). 



Initialization 

The following sections provide more detailed information on: 

■ Special locales 

■ Special Message Catalogs 

■ Default Message Catalogs 

■ Programs That Call Exec 

Special Locales 

The setlocaleO routine can set individual categories to specific locale values. 
For example, to have a program run with French date and time conventions 
and with Spanish sorting conventions, the following calls would establish the 
desired locale: 

#include <locale.h> 

set locale (LC.TIME, "french") ; 
setlocale(LC_COLLATE, "Spanish") ; 

This use, however, defeats the adaptive nature of the NLS routines and is not 
recommended. A preferred way to get the desired effect would be to use the 
"standard" initialization and to set the NLS environment variables when the 
program is run: 



7-4 Advanced NLS Topics 



LC_TIME=french ; export LC.TIME 
LC_CQLLATE=spani sh ; export LC.COLLATE 

Special Message Catalogs 

The cat open () routine can specify a path for the message catalog, as in: 
catd = catopen ("/usr/ special . cat" , 0) ; 

This use, however, defeats the generality of catopen 0 and is not 
recommended. A preferred way to get the desired effect would be use the 
"standard" initialization: 

catd = catopen ("special" , 0) ; 

Then set the NLS environment variable when the program is run: 

NLSPATH="/usr/7,N.cat" ; export NLSPATH 

Default Message Catalogs 

The "standard" default message handling is to use the C locale messages as the 
default string in catgetsO calls. This ensures that the program will be able to 
issue messages even if there is no message catalog available. 

If your application must access a C message catalog for the default messages, 
the following is suggested: 



Advanced NLS Topics 7-5 



if (!setlocale(LC_ALL) , "")) { 

fputs( "Warning! call to setlocale f ailed\n" , stderr) ; 

fputs( "Continuing processing using the \"C\" locale\n" , stderr); 

catd = (nl_catd)-l; 

} 

else 

catd = catopen (" name" , 0); 
if (catd == (nl_catd)-l) { 

/* if necessary, user may save LANG at this point */ 

putenv("LANG=C") ; 

/* try NLSPATH */ 

catd = catopen (" name" , 0) ; 

/* if necessary, user may restore LANG at this point */ 
if (catd == (nl_catd)-l) 

/* try hard-coded path */ 

catd = catopen ("/usr/lib/nls/C/name. cat" , 0); 

} 



Programs That Call Exec 

For commands that exec() other commands, we recommend that the first 
command call setlocaleO. If the call is unsuccessful, use putenvO to reset 
all the NLS environment variables to ensure that the other commands don't 
repeat the unsuccessful setlocaleO call and issue additional error messages. 



Messaging: printf/scanf Data Formatting 

Messages that contain run-time data will often need to be rearranged for 
display in different locales. For example, the following statement displays the 
date in C locale format: 

printf ("•/.d/y.d/'/.dXn", mo, dy, yr) ; 

and would give the following result: t 

\ 

10/31/87 



7-6 Advanced NLS Topics 



If this date were displayed in the U.K., the english locale, it would need to 
appear as: 

31/10/87 

which could be done with a statement such as: 
printf ("y.d/y.d/y.dYn" , dy, mo, yr) ; 

This solution, however, requires a change to the source program: the order of 
the printf arguments must be changed. 

To provide flexible formatting of data, the printf ( 3C) family of routines permits 
a conversion specification of the form %n% to indicate that conversion should be 
applied to the nth argument. For the C locale, we can use: 

printf (" e /,l$d/y,2$d/y,3$d\n", mo, dy, yr) ; 

and for the english locale, we can use: 
printf ("y,2$d/y.l$d/'/,3$d\n" , mo, dy, yr) ; 

This solution leaves the order of the printf arguments unchanged. It does 
require a change to the format string but the format string can be treated as a 
message and modified as needed for each locale. So our solution becomes: 

printf ( (catgets (catd , NL.SETN , 17 , '7.1$d/ c /,2$d/y,3$d\n" ) ) , mo , dy , yr) ; 

Then, the C locale message catalog would contain: 

17 0 /.l$d/y 0 2$d/y,3$d\n 

And the english locale message catalog would contain: 
17 y,2$d/y.l$d/$3$d\n 

The °/ 0 n$ conversion specification is also available in the scanj(SC) family of 
routines. 



Advanced NLS Topics 7-7 



A 

Examples of Internationalized Software 



Example 1: Rtlcat 

The following is the first of two example programs given to illustrate the usage 
of NLS routines. 



Program Description and Comments: 

/* 

** This program is used to illustrate several Internationalization 

** features including: 

** - message catalogs 

** - setlocale(Sc) 

** - right-to-left processing 

** - some multi-byte in get_basename() 

** Syntax: 

** rtlcat [options] [files ... 3 
** Options: 

** -1: force file mode to Latin 

** -n: force file mode to Non-Latin 

** -k: force file order to keyboard 

** -s: force file order to screen 

** Description: 

** Do a right-to-left cat . 

** 

** Rtlcat reads the concatenation of input files (or standard 
** input if none are given) and displays the input on standard 
** output. If "-" appears as an input file name, rtlcat reads 
** standard input at that point. You can use " — " to delimit 
** the end of options. 
** 

** The text orientation (mode) of a file can be right-to-left 
** (non-Latin) or left-to-right (Latin) . This text orientation 
** can affect the way data is arranged in the file. The data 
** arrangements that result are called screen order and 
** keyboard order. 



Examples of Internationalized Software A-1 



** 

** Rtlcat determines the mode and order of the input files and 
** the terminal. The file mode/order is gotten from the LANGOPTS 
** environment variable (environ(5)) . The terminal mode/order 
** is obtained from the primary and secondary status bytes 
** that result when the terminal is asked about its alpha-numeric 
** capabilities. This inquiry is done only on hpl50 and hp2392 
** terminals. Rtlcat assumes the terminal is the stdout device. 
** 

** If the input file mode/order and the terminal mode/order are 

** the same, then a simple copy is done. If the input file order 

** and the terminal order are different but their modes are the same, 

** then the input file data is rearranged by strord(3c) so it displays 

** properly on the terminal screen. If the input file mode and the 

** terminal mode are different, rtlcat simply stops with an error 

** message. It is not defined what a Non-Latin file should look like 

** when it is displayed on a terminal configured for Latin mode 

** (or vice versa) . 

*/ 



Include Files: 

#include <stdio.h> /* input - output */ 

#include <string.h> /* string function declarations */ 

#include <varargs.h> /* variable arguments */ 

#include <termio.h> /* for ioctl call */ 

#include <nl_types.h> /* for nl.catd */ 

#include <nl_ctype.h> /* for ADVANCE */ 

#include <locale.h> /* for setlocale */ 

#include <langinfo.h> /* for nl_langinfo */ 



External Declarations: 

extern nl_catd catopenO ; /* open message catalog */ 

extern char *catgets(); /* get message from catalog */ 

extern int catopenO ; /* close message catalog */ 

extern char *_errlocale() ; /* get bad locale settings */ 

extern void perrorO ; /* system error messages */ 

extern void exitO; /* leave */ 

extern int optind; /* argv index of next arg */ 

extern int opterr; /* error message indicator */ 

extern int errno; /* error number */ 

extern int sys_nerr; /* max error number */ 

extern char *getenv() ; /* get environment variable */ 



A-2 Examples of Internationalized Software 



extern char *strord() ; /* change data order */ 



Forward References: 

extern void PerrorO; /* local system print error message */ 
extern void error () ; /* local system error message */ 
extern char *get_basename() ; /* get basename of command name */ 
extern int copyO ; /* copy file */ 

extern int reorder () ; /* rearrange input file data */ 



General Constants: 

#define WARNING 0 /* warning error message */ 
#define FATAL 1 /* fatal error message */ 
#define GOOD 0 /* successful return value */ 
#define BAD -i /* unsuccessful return value */ 
#define TRUE 1 /* boolean true */ 
#define FALSE 0 /* boolean false */ 



Limits: 

#define MAX.ERR 256 /* max Perror message length */ 

#define MAX.TBUF 128 /* max tbuf length */ 

#define MAX.LINE 1024 /* max input line length */ 



Right-to-Left Terminal Constants: 

#define an_cap "\033*s-l~" /* request alpha-numeric capabilities */ 
#define sec.status "\033~" /* secondary status */ 
#define on.straps "\033&slglH" /* strap G k H on -- no handshake */ 
#define off.straps "\033&s0g0H" /* strap G & H off -- Dl */ 

#define DISPLAY 2 /* alpha-num display byte */ 
#define ORDER 0x10 /* alpha-num display ordering bit */ 
#define RTL.SEC 8 /* 2nd status byte 13 */ 
#define MODE 0x08 /* 2nd status mode bit */ 



Examples of Internationalized Software A-3 



Error Message Numbers: 



#define NL_SETN 1 /* message catalog set number */ 
#define BAD_USAGE 1 /* usage error message */ 
#define NOT_RTL_LANG 2 /* not a right-to-left language */ 
#define NOT_RTL_TERM 3 /* not a right-to-left terminal */ 
#define BAD_M0DE 4 /* terminal/file mode disagreement *■/ 

Error Message Strings: 

static char *Message[] = { 
"usage: 7,s [-Inks] [files ... ]\n", /* catgets 1 */ 
"V'/osX" not a right-to-left language\n" , /* catgets 2 */ 
"\"7iS\" not a right-to-left terminal \n" , /* catgets 3 */ 
"mode of terminal and mode of file do not agree\n", /* catgets 4 */ 

>; 

Types: 

typedef int (*PFI) () ; /* ptr to function returning int type */ 

Global Variables: 

static char *Progname; /* program name */ 

static char **Filename; /* ptr to ptr to current file name */ 

static FILE *Input = stdin; /* input file pointer (assume stdin) */ 

static PFI Process; /* routine to do the process */ 

static nl_catd Catd; /* message catalog descriptor */ 

static nl_mode File_mode; /* mode of file (Latin or Non-Latin) */ 

Main Program: 



/* 

** mainQ 
** 

** description: 

** driver routine for program 

** 

** assumptions: 

** all input come from stdin or named files 



A-4 Examples of Internationalized Software 



** all output goes to stdout 

*•* all errors go to stderr 

** the terminal screen is the stdout device 

** mode and order of the input files is given in LANGOPTS 

** 

*■* global variables: 

** Input: FILE pointer to the current input file 

** Filename: ptr to ptr to current file name 

** 

** return value : 

** 0: everything went ok 

** -i: had some trouble 

*/ 

main (argc, argv) 

int argc; /* initial argument count */ 

char **argv; /■* ptr to ptr to first program argument */ 

{ 

/* assume a sucessful return value*/ 
register int retval = GOOD; 

/* initialize, parse cmd line options, get input files, etc. */ 
if (start ( argc , argv) == BAD) { 
retval = BAD; 

} 

/* open and process input files one at a time */ 

for ( ; * Filename ; Filename++) { 

./* open input file and get next if can't open */ 
if (! strcmp( *Filename, "-")) { 
Input = stdin; 

} 

else if (! (Input = fbpen (*Filename, "r"))) { 
Perror( "fopen") ; 
retval = BAD; 
continue ; 

} 

/* process the file */ 
if ( EProcess) ( ) == BAD) { 
retval = BAD; 

> 



Examples of Internationalized Software A-5 



/* close input file unless it's stdin */ 
if (Input != stdin) { 

if (f close ( Input) == EOF) { 

Perror( "fclose"); 

retval = BAD; 

> 

} 

> 

/* end the program */ 
if (finish ( ) == BAD) { 
retval = BAD; 

> 

return retval; 

} 

/* 

** start () 

** 

** description: 

** set up language tables 

** open message catalogs 

** parse command line 

** set up global variables 

** 

** global variables: 

** Catd: nl_catd message catalog descriptor 

** Progname: char pointer to the program name 

** Filename: pointer to pointer to current file name 

** File.mode: mode (Latin or Non-Latin) of the current input file 

** 

** return value: 

** 0: everything went ok 

** -1: had some trouble 

*/ 

static int 

start ( argc, argv) 

int argc; /* current argument count */ 

char **argv; /* ptr to ptr to current argument */ 

•C 

nl_mode term_mode; /* mode of terminal (Latin-Non-Latin 



A-6 Examples of Internationalized Software 



nl_order term_order; /* order terminal (Key-Screen) */ 

nl_order file_order; /* order of file (Key-Screen) */ 

char *termname; /* terminal name from TERM */ 

char *lopts; /* language options from LANGOPTS */ 

int optchar; /* option character for getopts(3c) */ 

static char *deffiles[] = { (char*) NULL }; 

/* default input file name */ 

/* get the program base name in case it is renamed via ln(l) */ 
Progname = get_basename( *argv) ; 

/* get locale & initialize environment table */ 
if (!setlocale( LCJVLL, "")) { 

/* bad initialization */ 

(void) fputs( _errlocale() , stderr) ; 

Catd = (nl.catd) -1; 

(void) putenv( "LANG=") ; /* for perror */ 



} 

else 



/* good initialization: open message catalog, 
. . . use hardcoded name for first parameter, 
. . . keep on going if it isn't there */ 

Catd = catopen( "rtlcat" , 0); 



/* get file mode and order from LANGOPTS */ 
if (*(lopts = getenv( "LANGOPTS")) == '\0') { 

/* if not set assume Non-Latin mode, keyboard order */ 

lopts = "n_k"; 

> 

/* and do a lazy parse */ 

File_mode = lopts [0] == '1' ? NL.LATIN : NL_N0NLATIN; 
file.order = lopts [2] == *k' ? NL.KEY : NL_ SCREEN ; 

/* parse command line options 

. . . and possibly override file mode and order */ 
opterr =0; /* disable getopt error message */ 

while ((optchar = getopt ( argc, argv, "Inks")) != EOF) { 
switch (optchar) { 

case '1': /* force latin mode */ 

File_mode = NL.LATIN; 
break ; 

case 'n': /* force non-latin mode */ 

File_mode = NL.NONLATIN; 
break; 



Examples of Internationalized Software A-7 



case 'k': /* force keyboard order */ 

file.order = NL_KEY; 
break ; 

case 's': /* force screen order */ 

file.order = NL_ SCREEN ; 
break ; 

case '?': /* unrecognized option */ 

error ( FATAL, BADJJSAGE, Progname) ; 

> 

} 

/* initialize process routine */ 

if (strcmp( nl_langinfo( DIRECTION) , "1")) { 

/* do not have a right-to-left language: 
. . print a warning and do a copy */ 
char *langname; 

if (* (langname = getenv( "LANG")) == '\0') { 
/* if not set assume C language */ 
langname = "C" ; 

} 

error ( WARNING, NOT_RTL_LANG , langname); 
Process = copy; 

} 

else if (! rtl_term( &term_mode, &term_order, fetermname)) { 
/* do not have a right-to-left terminal: 
. . . print a warning and do a copy */ 
error ( WARNING, NOT_RTL_TERM , termname) ; 
Process = copy; 

> 

else if ( (File_mode == term_mode) && (file_order == term_order) ) { 
/'* mode the same, order the same: a regular copy */ 
Process = copy; 

> 

else if ( (File_mode -= term_mode) && (file_order != term_order) ) { 
/* mode the same, order different: must change the order */ 
Process = reorder; 

} 

else { 

/* Currently it is undefined what should happen when 

. . . the file mode and the terminal mode are different. */ 
error ( FATAL, BAD.MODE) ; 



/* set up input file arguments */ 



A-8 Examples of Internationalized Software 



Filename = ((argc - optind) < 1) ? def files : argv + optind ; 
return GOOD; 

} 

/* 

** finish () 

** 

** description: 

** get ready to leave: close message catalogs 

*# 

** global variables: 

** Catd: nl_catd message catalog descriptor 

** 

** return value: 

** 0: everything went ok 

** -1: had some trouble 

*/ 

static int 
f inish() 
{ 

/* close the message catalog 

. . . and do not complain about a missing catalog */ 
(void) catclose( Gatd) ; 

return GOOD; 

> 

/* 

** copyO 

** 

** description: 

** Input file and terminal have the same mode and the same order. 

** Just copy it to stdout . 

** 

** global variables: 

** Input: FILE pointer to the current input file 

** 

** return value: 

** 0: everything went ok 

** -1: had some trouble 

*/ 



Examples of Internationalized Software 



static int 

copyO 

{ 

char line [MAX_LINE] ; 

while ((fgets( line, MAX_LINE, Input)) != NULL) { 
if (fputs ( line, stdout) == EOF) { 
P error ( "fputs") ; 
return BAD; 

} 

> 

return GOOD; 

> 

/* 

** reorder () 

** 

** description: 

** Input file and terminal have the same mode but the order is different. 

** Rearrange the input file line with strord(3c) and copy it to stdout. 

** 

** global variables: 

** Input: FILE pointer to the current input file 

** File_mode: mode (Latin or Non-Latin) of the current input file 

** 

** return value: 

** 0: everything went ok 

** -1 : had some trouble 

*/ 

static int 
reorder () 
•C 

char line [MAX.LINE] ; 
char new_line [MAX.LINE] ; 

while ((f gets ( line, MAX.LINE, Input)) != NULL) { 

if (fputs( strord( new_line, line, File_mode) , stdout) == EOF) { 
Perror( "fputs") ; 
return BAD; 

} 

> 

return GOOD; 



A-10 Examples of Internationalized Software 



> 



/* 

** PerrorO 

** 

** description: 

** set up string with program name and the failed routine name 

** display system error message on stderr using perror(3) 

** 

** assumption: 

** perror string before the colon will not exceed MAX_ERR 

** 

** global variables: 

** Progname: char pointer to the program name 

** 

** return value: 

** no return value 

*/ 

/* VARARGS 1 */ 

static void 
Perror ( rname) 

char *rname; /* bad routine name */ 

{ 

char pstr [MAX_ERR] ; /* perror string before the colon */ 
/* set up perror string */ 

(void) sprintf( pstr, "'/.s (fts)", Progname, rname); 

/* print the system message or errno */ 
if (errno > 0 && errno < sys.nerr) { 
perror ( pstr) ; 

} 

else { 

(void) f printf ( stderr, "'/,s: errno = 7,d\n" , pstr, errno); 

> 

> 

/* 

** error () 

** 



Examples of Internationalized Software A-1 1 



** description: 

** display error message on stderr and leave if fatal 

** get message from a message catalog (catgets(3c) '■)■ 

** 

** assumptions : 

** all errors go to stderr 

** 

** global variables: 

** Progname: char pointer to the program name 

**■ Message: array of char pointers to format string messages 

** Catd: message catalog descriptor 

** 

** return value: 

** no return value 

*/ 



/* VARARGS 2*/ 



static void 

error ( fatal, num. va_alist) 

int fatal; 

int num; 

va_dcl 

{ 

register char *fmt; 
va_list args; 



/* Warning or Fatal error */ 

/* message number */ 

/* optional arguments */ 

/* points to format string */ 

/* points to optional argument list */ 



/* set up the optional argument list */ 
va_start( args); 



/* sync stdout with stderr */ 
if (fflush( stdout) == EOF) < 
Perror( "f flush") ; 

> 



/* get the message format string */ 

fmt = catgets( Catd, NL_SETN, num. Message [num- 1] ) ; 

/* print the program name on stderr */ 
if (fprintf( stderr, "%s: ", Progname) < 0) { 
Perror ( "fprintf " ) ; 

} 

/* print the error message on stderr */ 



A-12 Examples of Internationalized Software 



if (vfprintf ( stderr, fmt, args) < 0) { 
Perror( "vfprintf"); 

} 

/* close down the optional argument list */ 
va_end( args) ; 

/* leave if a fatal error */ 
if (fatal) { 

(void) finish ( ) ; 

if (fclose( Input) == EOF) { 
Perror( "f close") ; 

} 

exit ( BAD) ; 

> 

} 

/* 

** get_basename() 

** 

** description: 

** get the basename of the command 

** 

** assumptions: 

** the command name may have multi-byte characters 

** 

** return value : 

** ptr to start of base name 

*/ 

static char * 
get_basename( p) 

char *p; /* ptr to start of command name */ 

{ 

char *slash; /* pointer to char after slash */ 

for (slash = p ; *p ; ADVANCE ( p)) { 
if (CHARAT( p) == '/') i 
slash = p + 1 ; 

} 

} 

return slash; 

} 



Examples of Internationalized Software A-13 



/* 

** rtl_term() 

** 

** description: 

** right-to-left terminal 

** If right-to-left terminal get primary and secondary status 

** and see what the mode of order to the terminal is. 

** 

** assumptions: 

** only a hpl50 or hp2392 can be a right-to-left terminal 

** TERM set to reflect the terminal type. 

** 

** return value: 

** TRUE if right-to-left terminal 

** FALSE if not right-to-left terminal 

*/ 

static int 

rtl_term( term_mode, term_order, term) 

nl_mode *term_mode; /* mode of terminal */ 

nl_order *term_order; /* order of terminal */ 

char **term; /* terminal name */ 

{ 

char buf [MAX_TBUF] ; /* buffer for terminal information */ 

struct termio tbuf ; /* buffer for termio structure */ 

struct termio tbuf save; /* save old info */ 

/* assume right-to-left terminal is hpl50 or hp2392 */ 
*term = getenv( "TERM"); 

if (strncmp( *term, "hpl50", 5) && strncmp( *term, "hp2392", 6)) { 
return FALSE; 

> 

/* fetch & save current status of terminal driver */ 
if (ioctK 1, TCGETA, fetbuf) == -1) { 

Perror( "ioctl") ; 

return FALSE; 

> 

tbuf save = tbuf ; 

/* turn off echo to prevent status bytes from appearing on screen */ 
tbuf .c_lf lag &= "ECHO; 



A-14 



Examples of Internationalized Software 



/* set status of terminal driver with echo off */ 
if (ioctK 1. TCSETAF, fetbuf) == -1) { 

Perror( "ioctl") ; 

return FALSE; 

} 

/* turn off handshaking (G & H straps on) */ 
if (fputs( on_straps, stdout) == EOF) { 

Perror( "fputs") ; 

return FALSE; 

} 

/* get alpha-numeric capabilities: ordering is byte 2, bit 4 */ 
if (fputs ( an_cap, stdout) == EOF) { 

P error ( "fputs") ; 

return FALSE; 

} 

if (! fgets ( buf, MAX.TBUF, stdin)) { 
Perror( "fgets") ; 
return FALSE; 

} 

*term_order = (buf [DISPLAY] & ORDER) ? NL_KEY : NL_SCREEN; 

/* get secondary status: mode is byte 13, bit 3 */ 
if (fputs ( sec_status, stdout) == EOF) { 

Perror( "fputs") ; 

return FALSE; 

> 

if (! fgets ( buf, MAX.TBUF, stdin)) { 
Perror( "fgets") ; 
return FALSE; 

} 

*term_mode = (buf [RTL.SEC] & MODE) ? NL.NONLATIN : NL_LATIN; 

/* turn on Dl handshaking (G & H straps off) */ 
if (fputs ( off .straps, stdout) == EOF) { 

Perror( "fputs") ; 

return FALSE; 

} 

/* restore status of terminal driver */ 
if (ioctK 1, TCSETAF, fetbuf save) == -1) { 

Perror( "ioctl") ; 

return FALSE; 



Examples of Internationalized Software A-15 



} 

return TRUE; 



Example 2: Makefile 

FINDMSG = /usr/bin/f indmsg 

GENCAT = /usr/bin/gencat 

LINT * /usr/bin/lint 

RM = /bin/rm 

CFLAGS = -D_HPUX_SOURCE -0 

LDFLAGS = -s 

IFLAGS 

LIBS 

SOURCE = rtlcat.c 

OBJECT = rtlcat.o 

all: rtlcat rtlcat.cat 

rtlcat: $ (OBJECT) 

$(CC) -o $@ $ (OBJECT) $ (LDFLAGS) $(LIBS) 

rtlcat.cat: rtlcat. msg 

# NL_SETN defined once in the first source file or 

# NL_SETN defined with different values for each source file 

rtlcat. msg: $ (SOURCE) 

$ (FINDMSG) $ (SOURCE) > $@ 

.msg. cat : 

$ (GENCAT) $*.cat $*.msg 

.c .o: 

$(CC) -c $ (CFLAGS) $( IFLAGS) $< 
lint: $ (SOURCE) 



A-16 Examples of Internationalized Software 



$(LINT) -u $(CFLAGS) $(IFLAGS) $ (SOURCE) > lint 



clean: 

$(RM) -f *.o *.msg lint 

clobber: clean 

$(RM) -f rtlcat *.cat 

.SUFFIXES: .cat .msg 



Examples of Internationalized Software A-17 



B 



NLS References 



Following is a list of current NLS documentation in the HP-UX Reference. 



BUILDLANG(IM) 

CATGETS(3C) 

CATOPEN(3C) 

ENVIRON (5) 
FINDMSG(l) 

FINDSTR(l) 

FORDER(l) 

GENCAT(l) 

HPNLS(5) 

ICONV(l) 

ICONV(3C) 

INSERTMSG(l) 



buildlang - generate and display locale. def file 

catgets - get a program message 

catopen, catclose - open and close a message catalog 
for reading 

environ - user environment 

findmsg, dumpmsg - create message catalog file for 
modification 

findstr - find strings for inclusion in message catalogs 
forder - convert file data order 
gencat - generate a formatted message catalog file 
hpnls - HP Native Language Support (NLS) Model 
iconv - code set conversion 

iconvsize, iconvopen, iconvclose, iconvlock, ICONV, 
ICONV1, ICONV2 - code set conversion routines 

insertmsg - use findstr(l) output to insert calls to 
catgets(3C) 



NLS References B-1 



LANG(5) 

LANGINFO(5) 

L0CALEC0NV(3C) 

NL_LANGINF0(3C) 
NLJUST(l) 
NLSINFO(l) 
SETLOCALE(3C) 

STRORD(3C) 
FINDMSG(l) 



lang - description of supported languages 

langinfo - language information constants 

localeconv - query the numeric formatting conventions 
of the current locale 

nUanginfo - language information 

nljust - justify lines, left or right, for printing 

nlsinfo - display native language support information 

setlocale, getlocale - set and get the locale of a 
program 

strord - convert string data order 

findmsg, dumpmsg - create message catalog file for 
modification 



6-2 NLS References 



Previous Usage 



c 



The items identified under PREVIOUS have been superseded by the 
corresponding item under CURRENT. They are supported but will be 
withdrawn at some time. Continued use is not recommended. 



Previous 



Current 



BYTE_STATUS mbtowc 



byte_status 

catgetmsg 

catread 

CHARAT 

FIRSTOF2 

firstof2 

fprintmsg 
idtolang 
langid(5) 
langinfo 



mbtowc 

catgets 
catgets 
mbtowc 

mbtowc 

mbtowc 

fprintf 
(none) 
(none) 
nl_langinfo 



Reference 

multibyte(3C) 

multibyte(3C) 

catgets (3C) 
catgets (3C) 
multibyte(3C) 

multibyte(3C) 

multibyte(3C) 

printf(3C) 

nl_langinfo(3C) 



Notes 

Use of multi-byte routines 
recommended for portability 

Use of multi-byte routines 
recommended for portability 

Withdrawn by X/Open 



Use of multi-byte routines 
recommended for portability 

Use of multi-byte routines 
recommended for portability 

Use of multi-byte routines 
recommended for portability 



Previous Usage C-1 



Previous 



Current 



Reference 



Notes 



langinit 


setlocale 


setlocale(3C) 




n-computer 


C 


lang(5) 


See discussion in lang(b) 


nl_asctime 


nl_ascxtime 


ctime(3C) 




nLasctime 


strftime 


strftime(3C) 




nl_atof 


atof 


strtod(3C) 




nLctime 


nLcxtime 


ctime(3C) 




nl_fprintf 


fprintf 


string(3C) 




nl_fscanf 


fscanf 


string(3C) 




nLgcvt 


gcvt 


ecvt(3C) 




nLisalpha 


isalpha 


ctype(3C) 




nl_isctrl 


isctrl 


ctype(3C) 




nLisgraph 


isgraph 


ctype(3C) 




nl_isprint 


isprint 


ctype(3C) 




nl_isspace 


isspace 


ctype(3C) 




nl isupp6r 


isupper 


ctvDef3C) 




nLsprintf 


sprintf 


string(3C) 




nLsscanf 


sscanf 


string(3C) 




nLstrcmp 


strcoll 


string(3C) 




nLstrtod 


strtod 


strtod(3C) 




PCHARADV 


mbtowc 


multibyte(3C) 


Use of multi-byte routines 



PCHARADV 



PCHAR 



WCHARADV 



WCHAR 



nl_tools_16(3C) 
nl_tools_16(3C) 



recommended for portability 

Use of multi-byte routines 
recommended for portability 

Use of multi-byte routines 
recommended for portability 



C-2 Previous Usage 



Previous 

printmsg 
SECOF2 

secof2 

sprintmsg 
strcmp[8|16] 
strncmp[8|16] 
WCHARADV 



Current 

printf 
mbtowc 

mbtowc 

sprintf 
strcoll 
nLstrncmp 
mbtowc 



Reference 

printf(3C) 
multibyte(3C) 

multibyte(3C) 

printf(3C) 
string(3C) 
string(3C) 
multibyte(3C) 



Notes 



Use of multi-byte routines 
recommended for portability 

Use of multi-byte routines 
recommended for portability 



Previous Usage C-3 



Languages and Codesets 



Following are native languages and the HP codesets that support them. 



Language Codeset 



American 


ROMANS 


Arabic 


ARABICS 


Canadian French) 


ROMAN8 


Chinese-s 


PRC 15 


Chinese-t 


ROC15 


Danish 


ROMANS 


Dutch 


ROMAN8 


English 


ROMAN8 


Finnish 


ROMANS 


French 


ROMANS 


German 


ROMAN8 


Greek 


GREEKS 


Icelandic 


ROMAN8 


Italian 


ROMANS 


Japanese 


JAPAN15 


Japanese. uj is 


UJIS 


Katakana 


KANA8 


Korean 


KOREA 15 


Norwegian 


ROMANS 


Portuguese 


ROMANS 


Spanish 


ROMANS 


Swedish 


ROMAN8 


Turkish 


TURKISH8 


Western Arabic ^ 


ARABIC8 



Languages and Codesets D-1 



( 



Glossary 



Note For additional information on terms used with HP-UX, please 

see the Glossary section of HP-UX Reference, vol. 1. 



adaptation 

As used in this document, adaptation is the process of making a product 
and all that goes with it (including documentation, training, distribution, 
support, etc.) suitable for, and available to, markets outside the country of 
its origin. Adaptation includes, but is not limited to, internationalization 
and localization. 

alternate character set 

A codeset used to represent special, ancillary characters. 

application program 

A program which performs a specific task for the end-user. 

ARABIC8 

The Hewlett-Packard supported 8-bit codeset for the Arabic language. 

bit 

A contraction of Binary digiT. A bit can have a value of 0 or 1. 
byte 

A unit of data storage consisting of 8 bits. A byte can represent one 
ASCII, KANA8, GREEK8, TURKISH8, ARABIC8, or ROMAN8 
character. 

byte redefinition 

Corruption of a multi-byte character when any one of its bytes is treated as 
a 1-byte character. 



Glossary-1 



C (locale) 

An invented, artificial computer locale which specifies the minimal 
environment for C translation. C locale is the default when natural 
languages/locales are not installed or are not called by a program. 

character set 

A set of symbols required to write a language. Different languages often 
have different character sets. 

coded character set 
See codeset. 

codeset 

A set of unambiguous rules that establishes a one-to-one relationship 
between each character of a character set and the numeric representation 
for that character. 

7- bit: A codeset that uses seven bits to represent a collection of 
characters, control codes, and the space character. A 7-bit codeset allows 
a maximum of 128 characters which does not accomodate international 
languages. ASCII is an example of a 7-bit codeset. 

8- bit: A codeset that uses all eight bits of a single byte to encode each 
character in the codeset. These codesets are designed so the range 0 
through 127 are ASCII including the control codes and space character. 
Non-ASCII characters appear in the range 128 through 255. (Note, 
the KANA8 character set substitutes the yen symbol for the backslash 
symbol, so it is not a superset of ASCII). 

multi-byte: A codeset that uses two or more bytes to encode characters. 
Languages such as Chinese, Japanese, and Korean require more than 256 
characters, which is the maximum provided by 8-bit character sets. A 
full 16 bits (2-bytes) per character allows definition of 65,536 unique 
character codes. The HP-15 encoding scheme limits practical use of all 
16 bits, thus limiting the size of the codeset to 49,284 characters. The 
HP- 16 encoding scheme limits the size to 35,344 characters. Under 
different circumstances, 2 bytes can be interpreted as one multi-byte 
value or two single- byte values. 

single-byte: a 7-bit or 8-bit codeset. 



Glossary-2 



context analysis 

The process of determining the proper shape of a character based on its 
position in the word. For some languages, a character can have a different 
shape if it is at the start of a word, in the middle of a word, at the end of 
a word, or standing alone. Currently, context analysis is defined for the 
Middle Eastern and North African Arabic languages. 

control character or control code 

A nonprinting member of a character set that produces action in a device. 
In ASCII, control characters are those in the code range 0 through 31, 
and 127. These values and the space character, with code value 32, are 
not used for any other purpose. Code values 128 through 160 and 255 are 
also treated as control codes in some cases. Most control characters can 
be generated by simultaneously pressing a displayable character key and 

( CTRL ] . 

data directionality 

Refers to the direction text will appear on the screen; left-to-right or 
right-to-left. 

data ordering 

Refers to the arrangement of data within a file, internal buffer, or during a 
transfer to or from peripherals. The modes of data ordering are "keyboard 
(phonetic) order" and "screen order" . 

default search path 

The sequence of directory prefixes that sh, csh, and other HP-UX 
commands apply when searching for a file known by an incomplete 
path name. It is defined by PATH in environ. Log in sets PATH = 
. :bin:/usr/bin, which means that your working directory is the first 
directory searched, followed by /bin, followed by /usr/bin. 

directionality 

See data directionality 

downshifting 

The provision for producing lowercase letters by using the (shift) key. 



/ 



Glossary-3 



ECMA 

The European Computer Manufacturers Association standards 
organization. 

GREEK8 

The Hewlett-Packard supported 8-bit codeset for the Greek language. 
Hindi digits 

An alternate representation of numbers used in some Arabic countries. 
Other Arabic countries use the Latin representation of numbers. 

HP-8 

The HP implementation of the ISO (International Standard Organization) 
8-bit character codeset. 

HP-15 

The HP encoding scheme for internal operating system representation of 
16-bit data that uses only 15 bits for characters. 

HP-16 

The HP encoding scheme for 16-bit codesets used for communicating 8- 
and 16-bit data between a peripheral and a computer. This is derived 
from the ISO (International Standards Organization) multi-byte character 
processing standard. By using 16-bit data 35,344 characters can be 
represented. 

ideogram or ideograph 

A pictographic symbol used to represent whole words or syllables. 

internationalization 

Design and modification of products to make them localizable. For 
example, modification of application programs before compilation to make 
use of locale-independent library routines and to ensure that single-byte 
and multi-byte data can be handled in a locale-sensitive way by hardware 
and software. 

IS07 

International Standards Organization 7-bit character substitution, in which 
the character graphics associated with some less-used ASCII codes are 
changed to other characters needed for a particular language. 



Glossary-4 



JAPAN15 

The HP-supported 16-bit codeset for the Japanese language. 
KANA8 

The HP-supported 8-bit codeset for support of phonetic Japanese 
(Katakana) . 

Kanji 

The Japanese ideographic codeset based on Chinese characters. The set 
consists of roughly 50,000 characters. 

Katakana 

The Japanese phonetic codeset typically used in formal writing. The set 
consists of 64 characters, including punctuation. 

keyboard order 

Characters arranged the way they are entered from the keyboard. 
KOREA15 

The HP-supported 16-bit codeset for the Korean (Hangul) language. 
LANG 

The HP-UX environment variable (LANGuage) that should be set to the 
name of the locale corresponding to the native language to be used. 

LANGOPTS 

The HP-UX environment variable that defines the options for mode (Latin 
or non-Latin) and data order (keyboard or screen). 

language: 

computer: An artificial language consisting of a set of characters and 
rules, with specific functions for computer programming. The C language 
is an example of a computer language. 

native: The first language of the user. Alternatives are "national" or 
"local" language. 

natural: The spoken or written language used by humans, 
programming: Alternative to "computer language" . 



Glossary-5 



supported: The computer-implemented version of a written or spoken 
language. See /usr/lib/nls/conf ig for a list of NLS-supported 
languages. 

Latin mode 

The mode where the terminal is configured so that the text display order is 
from left to right. 

library 

A set of subroutines contained in a file that can be accessed by a user 
program. 

library routine 

A subroutine contained in a library file used to perform a task, 
literal 

Computer code, displayed as it would appear in the output, or as it would 
be typed in. 

local customs 

The standard way dates, times, currency, numeric quantities, and collation 
are written in a particular region or country. Also known as country or 
local conventions. 

locale 

That part of the environment of a process which contains international 
data. 

local environment files 

Files external to the code of a software product containing locale-dependent 
information such as messages, prompts, commands, icons, etc. Localization 
centers are responsible for the construction and/or translation of these files. 

localizability 

The attribute of a hardware or software product which allows it to be 
localized through predefined steps (normally without redesign or receding). 
The outcome of the internationalization effort. 



Glossary-6 



localization 

The adaptation of an internationalized hardware/software system for use in 
different countries or local environments. 

localization center 

An organization in a country or region that provides software or hardware 
products specifically tailored for use in that country or region. 

message catalog 

The external file containing prompts, responses to prompts, and error 
messages in the user's native language. 

message catalog system 

A set of tools developed by Hewlett-Packard to extract print statements 
from C programs and place them in, or retrieve them from, the message 
catalog. 

mode 

The order in which text is displayed: Latin (left-to-right), or non-Latin 
(right-to-left). 

n-computer (native-computer) 

An invented, artificial computer locale which specifies the minimal 
environment for C translation. Now replaced by the C locale. 

Native Language Support (NLS) 

The HP set of software facilities within the HP-UX system which supports 
proper handling of native language data, including character data, country 
formatting conventions, and other local customs. 

non-Latin mode 

The mode where the terminal is configured so that the text display order is 
from right to left. 

NLS Coordinator 

This person handles the responsibility for localization of software and may 
also participate in installing and administering NLS aspects of a system. 



Glossary-7 



opposite language 

When the terminal is in non-Latin mode, Latin characters are the 
"opposite language" and when the terminal is in Latin, non-Latin 
characters are the "opposite language" . NLS allows both Latin and 
non-Latin characters to appear on the same line. Opposite language 
characters are inserted on the screen in the opposite direction by using an 
opposite language key. 

order 

The temporal order in which data is used: screen order (the order in which 
characters are displayed) or keyboard order (the order in which the user 
enters keystrokes. 

path name 

A sequence of directory names separated by slashes (/), and ending in any 
type of file name. 

phonetic order 

The ordering of characters by the way they are read or spoken. 
PRC15 

The HP-supported 16-bit codeset for Simplified Chinese, the language of 
the People's Republic of China. 

prelocalize; prelocalization 
See internationalization 

programming language 
See language. 

radix character 

The actual or implied character that separates the integer portion of a 
number from the fractional portion. 

ROC 

The HP-supported 16-bit codeset for Traditional Chinese, the language of 
the Republic of China. 



Glossary-8 



R0MAN8 

The HP-supported 8-bit codeset for Europe. 

routine 

See library routine. 

screen order 

The order in which characters appear on the screen, 
syntax 

The rules governing sentence structure in a spoken language, or statement 
structure in a computer language such as that of a compiler program. 

TURKISH8 

The HP-supported 8-bit codeset for the Turkish language, 
upshifting 

The means by which the peripheral produces uppercase letters by using the 
(shift) key. 

USASCII 

A less common name for ASCII, the American Standard Code for 
Information Interchange. 

X/Open 

An international standards group dedicated to creating a free and open 
market. The group is concerned with standards selection and adoption, 
using International Standards where they exist. 



Glossary-9 



Index 



A 

ADVANCE macro, 4-4 
american locale, 6-3 
applications designer, 2-1, 2-3 
ASCII, 2-4 

asctime library routine, 4-8 
atof library routine, 4-8 

B 

buildlang, 5-1 
example, 6-3 
buildlang -d, 7-3 
buildlang, using, 6-4 
"byte redefinition" , 4-3 
byte_status library routine, 4-4 
BYTE_STATUS macro, 4-4 

c 

case, 2-5 

catclose command, 4-11, 4-13 
catgetmsg command, 4-11 
catgets, 4-13, 4-16, 7-5 

default message, 4-14 
catgets command, 4-11 
catopen, 4-1, 4-16, 7-5 
cat op en command, 4-11 
character 

16-bit, 2-4 

8-bit, 2-4 

clustered, 2-6 

comparison, 2-6 

expanded, 2-6 



identify traits, 4-8 

multi-byte, 2-4, 2-8 
character handling, 2-4 
character pointer manipulation, 4-4 
character sets 

ASCII, 5-4 

European, 5-4 

KANA8, 5-4 

Katakana, 5-4 

multi-byte, 2-5, 5-4 

non-HP, 5-5 

peripherals for, 5-4 

Roman8, 5-4 

single-byte, 5-4 
character set (see also 7-bit, 8-bit, 16-bit) 

ideographic, 2-5 
CHARADV macro, 4-4 
CHARAT macro, 4-4 
Chinese collating sequence, 2-5 
C locale as default, 7-5 
C locale messages, 6-2 
clustered characters, 2-6 
.codeset, 5-2 
codeset 

conversion, 7-1 

multi-byte, 4-3 
codesets 

HP, D-l 

multi-byte, programming with, 4-3 

support, D-l 
collating sequence, 2-5 
comparing characters, 2-6 



index- 1 



comparing strings, 2-6 
compiling message catalogs, 4-16 
concatenation 

right-to-left, A-l 
conventions 

manual, 1-4 
conversion of existing programs, 4-10 
conversion specification 7-6 
creating a message catalog, 4-10 
C Shell, 5-3 
ctime, 4-10 

ctime library routine, 4-8 
ctype(3C) library routine, 4-8 
currency, 2-10 

D 

data directionality, 2-6, 2-8, 7-1 

data formatting, 7-6 

data integrity, 4-3 

data order, 7-1 

date, 6-3 

date. cat , 6-2 

date display, 7-6 

days, display, 2-10 

default message 

alternatives, 4-14 

in catgets call, 4-14 

in default message catalog, 4-14 
default native language, 5-3 
default string, 4-13 
directionality 

data, 7-1 
documentation, NLS, B-l 
dumpmsg, 6-2 
dumpmsg command, 4-27 

E 

ecvt library routine, 4-8 
end-user, 2-3 
environment changes, 3-3 
environment variable 



LANGOPTS, 5-3 

NLSPATH, 5-3 
environment variables 

description, 3-1 ( 

example, 3-3 

LANG, 4-11, 6-1 

LANGOPTS, 6-1 

LC_ categories, 6-1 

NLSPATH, 4-11, 6-1 

NLSPATH ,4-11 

setting, 3-1, 6-1 
error messages, A-4 
/etc/csh. login, 5-3 
/etc/profile, 5-3 
exec 

calls to, 7-6 
expanded characters, 2-6 

F 

file hierarchy, 5-1 

file system ( 

finding, 5-1 

organization, 5-1 
findmsg, 4-19 
f indmsg command, 4-27 
f indstr, 4-23 
f indstr command, 4-22 
first of 2 library routine, 4-4 
FIRSTof 2 macro, 4-4 
fopen, 4-22 
f order, 7-3 

format of source message files, 4-15 
formatting 

date and time, 4-8 

monetary, 4-8 

numeric, 4-8 
fprintf , 7-3 j 
fprint library routine, 4-8 
f scanf , 7-3 



lndex-2 



G 

gcvt library routine, 4-8 
gencat, 4-14, 4-16, 4-24 

example, 6-3 
gencat command, 4-16, 4-27 
generating message catalogs, 4-16 
getlocale, 7-3 
Gregorian calendar, 2-10 
grep, 4-10 

guidelines for message catalogs, 4-28 
H 

HP-UX commands 
message catalogs, 6-2 

I 

iconv, 7-1 

identifying character size, 4-4 
identifying character traits, 4-8 
initialization, 7-4 
initializing 

a program, 4-1 

program messages, 4-1 

standard program, 4-1 
insertmsg, 4-23 

example , 4-23 
insertmsg command, 4-23 
installing optional locales, 5-4 
internationalization, 2-1, 2-3 
Internationalization, Glossary-4 
isalpha library routine, 4-8 
IS07, 5-4 

i supper library routine, 4-8 
K 

KANA8, 5-4 
Kanji, 2-5 
Katakana, 5-4 
keyboard order, 7-1 
Korn Shell, 5-3 



L 

LANG, 3-1, 4-2, 4-24 

LANG environment variable, 4-11 

LANGOPTS, 3-1, 7-1, 7-3 

language 

name, 5-1 

number (ID), 5-1 

supported, 5-4 
Latin mode, 7-1 
LC -categories, 4-2 
LC -COLLATE, 3-1 
LC_CTYPE, 3-1 
LC -MONETARY, 3-1 
LC_NUMERIC, 3-1 
lconv, 4-9 
LC-TIME, 3-1 
libraries with messages, 4-20 
local customs 

character processing, 4-8 

string processing, 4-8 
local customs (conventions), 2-1, 2-4, 2-9 
locale 

directories for, 5-2 

form of, 5-2 
locale 

buildlang, 6-3 

creating new, 6-3 

default, 5-4 

displaying, 3-5 

testing, 3-5 

verifying installation, 6-5 
localeconv, 7-3 

example, 4-9 
locale information, 7-3 
localization, 2-1, 2-3 
. login, 3-4, 5-3 

M 

make, A- 16 
make files, 4-28 
manual conventions, 1-3 



lndex-3 



message catalog 

cookbook, 2-11 

new, 4-14 

overview, 2-3, 2-11 

using gene at, 4-14 
message catalogs, 4-10 

automated creation of, 4-28 

C locale, 7-7 

closing, 4-13 

compiling, 6-3 

compiling , 4-16 

conversion of existing programs for, 4- 
21 

"cookbook", 4-28, 6-2 

creating, 4-10 

default, 4-14, 7-5 

default error messages, 4-28 

for HP-UX commands, 6-2 

generating, 4-16 

guidelines, 4-28 

HP-UX, 3-4 

installation, 5-3 

installing, 4-25, 6-3 

location, 5-3 

message numbers, 4-28 

opening, 4-11 

opening and closing, 4-20 

programming example, 4-16, A-l 

test directories, 4-24 

testing, 4-24 

translating, 6-2 

updating, 4-10, 4-26 

using correct, 4-19 

using revision code, 4-19 
message numbers, 4-27 
messages, 2-4, 2-11 

conversion of existing programs for, 4- 
21 

in arrays, 4-18 
in variables, 4-17 
printf /scanf, 7-6 



retrieving, 4-13 
mode, 7-1 

months, display, 2-10 
multi-byte, A-l 

macros, 4-4 

processing, 4-3 

program conversion, 4-7 

programming with, 4-3 
multi-byte character codes, 2-5 
multi-byte routines, usage reference, C-l 

N 

native language, 2-1 
native languages 
supported, D-l 
nl_asctime library routine, 4-8 
nLctype(SC) library routine, 4-11 
nl_cxtime library routine, 4-8 
nl_f printf library routine, 4-8 
NLIO, 5-5 
NLIO system, 5-4 
nljust, 7-3 

nl_printf library routine, 4-8 
NLS 

aspects of, 2-4 

definition, 2-1 

support, 2-1 
NLS Coordinator, 3-1, 5-3, 6-1 

tasks, 3-1 

translation activities, 3-1 
NLS documentation, B-l 
NL.SETD, 4-15 
nlsinf o, 7-3 

NLSPATH, 3-1, 4-2, 4-24, 7-5 
NLSPATH environment variable, 4-11 
NLS routines 
status, C-l 
nl_strncmp, 4-9 
non- ASCII string collation, 2-6 
non-Latin mode, 7-1 
number representation, 2-9 



lndex-4 



numeric formatting, 2-9 

o 

obsolete, NLS routines, C-l 
obsolete routines, X/Open, C-l 
opening message catalogs, 4-11 
order 
data, 7-1 

P 

parity, 3-4 

PCHARADV, replaced by WCHARADV, 4-4 

PCHAR, replaced by WCHAR, 4-4 

peripheral configuration, 5-4 

peripherals, 5-4 

phonetic order, 7-1 

pointer manipulation, character, 4-4 

printf , 4-17 

conversion specification, 7-6 

order of arguments, 7-6 
printf library routine, 4-8, 4-10 
processing order, A-l 

profile, 3-4, 5-3 
program initialization 

standard, 7-5 
programmer, 2-1, 2-3 
programming 

example, A-l, A-4 
programs 

conversion of existing, 4-10 
putenv, 7-6 

Q 

$quote directive, 4-16 
R 

regular expressions, 2-8 
retrieving messages, 4-11 
revision code, 4-19 
right-to-left order, A-l 
right-to-left terminal, A-3 



ROMAN8 character set, 2-4 
routines, NLS, C-l 

S 

screen order, 7-1 

secof2 library routine, 4-4 

SECof 2 macro, 4-4 

$set, 4-27 

$set directive, 4-15 

setlocale, 4-1-2, 7-5-6, A-l 

setlocale, 7-4 

shifting, 2-5 

single-byte 

program conversion, 4-7 
software developer, 7-1 
sorting, 2-5 
source file 

editing, 4-24 

management, 4-26 

multi-file management, 4-26 
source message file format, 4-15 
source program 

editing, 4-23 
special locales, 7-4 
status, NLS routines, C-l 
strcmp, 4-10 

example, 4-9 
strcoll, 4-9 

strftime library routine, 4-8 
string comparison, 2-6 
string files 

removing non-messages from, 4-22 
strncmp, 4-9 

strod(SC) library routine, 4-8 
strord 

example, 7-3 
strtod library routine, 4-8 
strxf rm 

example, 4-9 
support 

aspects of, 2-4 



lndex-5 



system administrator, 2-3, 5-i 
System Administrator, 3-1, 6-3 
tasks, 3-1 

T 

terminal 

setting, 3-4 

stty, 3-4 
terminal constants 

right- to-left, A-3 
terminals 

8-bit mode, 5-4 
^territory, 5-2 
time, display, 2-10 
toupper library routine, 4-8 
translating 

problems and solutions, 6-2 



U 

usage, previous, C-l 
US ASCII character set, 2-4 
/usr/lib/nls/conf ig directory, 5-1, 5-3 
/usr/lib/nls/ language -name, 5-1 

W 

WCHARADV macro, 4-4 
WCHAR macro, 4-4 
weeks, display, 2-10 

wide-characters, conversion with multi- 
byte, 4-6 

wide-characters, programming example, 
4-6 

wide-characters, programming with, 4-6 



lndex-6 



HEWLETT 
PACKARD 



HP Part Number 
97089-90058 

Microfiche No. 97089-99058 
Printed in U.S.A. E0989 




For Internal Use Only 



